Common data types and metadata

Common file level metadata fields

Each derivative data file SHOULD be described by a JSON file provided as a sidecar or higher up in the hierarchy of the derived dataset (according to the Inheritance Principle) unless a particular derivative includes REQUIRED metadata fields, in which case a JSON file is also REQUIRED. Each derivative type defines their own set of fields, but all of them share the following (non-required) ones:

Key name	Requirement Level	Data type	Description
Description	RECOMMENDED	string	Free-form natural language description. This describes the nature of the file.
Sources	OPTIONAL	array of strings	A list of files with the paths specified using BIDS URIs; these files were directly used in the creation of this derivative data file. For example, if a derivative A is used in the creation of another derivative B, which is in turn used to generate C in a chain of A->B->C, C should only list B in `"Sources"`, and B should only list A in `"Sources"`. However, in case both X and Y are directly used in the creation of Z, then Z should list X and Y in `"Sources"`, regardless of whether X was used to generate Y. Using paths specified relative to the dataset root is DEPRECATED.
RawSources	DEPRECATED	array of strings	A list of paths relative to dataset root pointing to the BIDS-Raw file(s) that were used in the creation of this derivative. This field is DEPRECATED, and this metadata SHOULD be recorded in the `Sources` field using BIDS URIs to distinguish sources from different datasets.

Examples

Preprocessed bold NIfTI file in the original coordinate space of the original run. The location of the file in the original datasets is encoded in the Sources metadata, and _desc-<label> is used (the last entity before the suffix) to prevent clashing with the original filename.

└─ sub-01/
   └─ func/
      ├─ sub-01_task-rest_desc-preproc_bold.nii.gz 
      └─ sub-01_task-rest_desc-preproc_bold.json

{
    "Sources": ["bids:raw:sub-01/func/sub-01_task-rest_bold.nii.gz"]
}

Note that "raw" must appear in the DatasetLinks metadata in dataset_description.json. For example, in the case that the given derivatives dataset is nested within the "derivatives" directory of a raw dataset, the entry in DatasetLinks may say: "raw": "../..".

If this file was generated with prior knowledge from additional sources, such as the same subject's T1w, then both files MAY be included in Sources.

{
    "Sources": [
        "bids:raw:sub-01/func/sub-01_task-rest_bold.nii.gz",
        "bids:raw:sub-01/anat/sub-01_T1w.nii.gz"
    ]
}

On the other hand, if a preprocessed version of the T1w image was used, and it also occurs in the derivatives, Sources may include both the local, derivative file, and the raw original file.

{
    "Sources": [
        "bids::sub-01/anat/sub-01_desc-preproc_T1w.nii.gz",
        "bids:raw:sub-01/func/sub-01_task-rest_bold.nii.gz"
    ]
}

Spatial references

Derivatives are often aligned to a common spatial reference to allow for the comparison of acquired data across runs, sessions, subjects or datasets. A file may indicate the spatial reference to which it has been aligned using the space entity and/or the SpatialReference metadata.

The space entity may take any value in Image-Based Coordinate Systems.

If the space entity is omitted, or the space is not in the Standard template identifiers table, then the SpatialReference metadata is REQUIRED.

Key name	Requirement Level	Data type	Description
SpatialReference	RECOMMENDED if the derivative is aligned to a standard template listed in Standard template identifiers. REQUIRED otherwise.	string or object	String or object describing the reference image to which the current image is aligned. For images with a single reference, the value MUST be a single string. For images with multiple references, such as surface and volume references, a JSON object MUST be used.

SpatialReference key allowed values

Value	Description
`"orig"`	A (potentially unique) per-image space. Useful for describing the source of transforms from an input image to a target space.
URI	This can be used to point to a specific file. Paths written relative to the root of the derivative dataset are DEPRECATED in favor of BIDS URIs.

In the case of images with multiple references, an object must link the relevant structures to reference files. If a single volumetric reference is used for multiple structures, the VolumeReference key MAY be used to reduce duplication. For CIFTI-2 images, the relevant structures are BrainStructure values defined in the BrainModel elements found in the CIFTI-2 header.

Examples

Preprocessed bold NIfTI file in individual coordinate space. Please mind that in this case SpatialReference key is REQUIRED.

└─ sub-01/
   └─ func/
      ├─ sub-01_task-rest_space-individual_bold.nii.gz 
      └─ sub-01_task-rest_space-individual_bold.json

{
    "SpatialReference": "bids::sub-01/anat/sub-01_desc-combined_T1w.nii.gz"
}

Preprocessed bold CIFTI-2 files that have been sampled to the fsLR surface meshes defined in the Conte69 atlas along with the MNI152NLin6Asym template. In this example, because all volumetric structures are sampled to the same reference, the VolumeReference key is used as a default, and only the surface references need to be specified by BrainStructure names. Here referred to via "https" URIs.

└─ sub-01/
   └─ func/
      ├─ sub-01_task-rest_space-fsLR_den-91k_bold.dtseries.nii 
      └─ sub-01_task-rest_space-fsLR_den-91k_bold.json

{
    "SpatialReference": {
        "VolumeReference": "https://templateflow.s3.amazonaws.com/tpl-MNI152NLin6Asym_res-02_T1w.nii.gz",
        "CIFTI_STRUCTURE_CORTEX_LEFT": "https://github.com/mgxd/brainplot/raw/master/brainplot/Conte69_Atlas/Conte69.L.midthickness.32k_fs_LR.surf.gii",
        "CIFTI_STRUCTURE_CORTEX_RIGHT": "https://github.com/mgxd/brainplot/raw/master/brainplot/Conte69_Atlas/Conte69.R.midthickness.32k_fs_LR.surf.gii"
    }
}

Preprocessed or cleaned data

Template:

<pipeline-name>/
    sub-<label>/
        [ses-<label>/]
            <datatype>/
                <source-entities>[_space-<space>][_desc-<label>]_<suffix>.<extension>

Data is considered to be preprocessed or cleaned if the data type of the input, as expressed by the BIDS suffix, is unchanged. By contrast, processing steps that change the number of dimensions are likely to disrupt the propagation of the input's suffix and generally, the outcomes of such transformation cannot be considered preprocessed or cleaned data.

Examples of preprocessing:

Motion-corrected, temporally denoised, and transformed to MNI space BOLD series
Inhomogeneity corrected and skull stripped T1w files
Motion-corrected DWI files
Time-domain filtered EEG data
MaxFilter (for example, SSS) cleaned MEG data

The space entity is recommended to distinguish files with different underlying coordinate systems or registered to different reference maps. See Spatial references for details. The desc entity ("description") is a general purpose field with freeform values, which SHOULD be used to distinguish between multiple different versions of processing for the same input data.

Examples of preprocessed data:

└─ pipeline1/
   └─ sub-001/
      ├─ anat/
      │  ├─ sub-001_space-MNI305_T1w.nii.gz 
      │  └─ sub-001_space-MNI305_T1w.json 
      └─ func/
         ├─ sub-001_task-rest_run-1_space-MNI305_desc-preproc_bold.nii.gz 
         └─ sub-001_task-rest_run-1_space-MNI305_desc-preproc_bold.json

└─ pipeline2/
   └─ sub-001/
      └─ eeg/
         ├─ sub-001_task-listening_run-1_desc-autoannotation_events.tsv 
         ├─ sub-001_task-listening_run-1_desc-autoannotation_events.json 
         ├─ sub-001_task-listening_run-1_desc-filtered_eeg.edf 
         └─ sub-001_task-listening_run-1_desc-filtered_eeg.json

All REQUIRED metadata fields coming from a derivative file's source file(s) MUST be propagated to the JSON description of the derivative unless the processing makes them invalid (for example, if a source 4D image is averaged to create a single static volume, a RepetitionTime property would no longer be relevant).

descriptions.tsv

Template:

[sub-<label>/]
    [ses-<label>/]
        [sub-<label>_][ses-<label>_]descriptions.tsv
        [sub-<label>_][ses-<label>_]descriptions.json

Optional: Yes

To keep a record of processing steps applied to the data, a descriptions.tsv file MAY be used. The descriptions.tsv file consists of one row for each unique desc-<label> entity used in the dataset and a set of REQUIRED and OPTIONAL columns:

Column name	Requirement Level	Data type	Description
desc_id	REQUIRED	string	A `desc-<label>` entity present in the derivatives dataset. The `desc_id` column contains the labels used with the `desc` entity, within the particular nesting that the `descriptions.tsv` file is placed. For example, if the `descriptions.tsv` file is placed at the root of the derivative dataset, its `desc_id` column SHOULD contain all labels of the `desc` entity) used across the entire derivative dataset. Values in `desc_id` MUST be unique. This column must appear first in the file.
description	REQUIRED	string	Free-form text description of the entity's label (defined in `<entity>_id` column). The corresponding label column is `desc_id`. This column must appear second in the file.
Additional Columns	OPTIONAL	`n/a`	Additional columns are allowed.

This file MAY be located at the root of the derivative dataset, or at the subject or session level (Inheritance Principle).

The use of descriptions.tsv files together with the desc entity are helpful to document how files are generated, even if their use may not be sufficient to provide full computational reproducibility.

Example use of a `descriptions.tsv` file

├─ raw/
│  ├─ CHANGES 
│  ├─ README 
│  ├─ channels.tsv 
│  ├─ dataset_description.json 
│  ├─ participants.tsv 
│  └─ sub-001/
│     └─ eeg/
│        ├─ sub-001_task-listening_events.tsv 
│        ├─ sub-001_task-listening_events.json 
│        ├─ sub-001_task-listening_eeg.edf 
│        └─ sub-001_task-listening_eeg.json 
└─ derivatives/
   ├─ descriptions.tsv 
   └─ sub-001/
      └─ eeg/
         ├─ sub-001_task-listening_desc-Filt_eeg.edf 
         ├─ sub-001_task-listening_desc-Filt_eeg.json 
         ├─ sub-001_task-listening_desc-FiltDs_eeg.edf 
         ├─ sub-001_task-listening_desc-FiltDs_eeg.json 
         ├─ sub-001_task-listening_desc-preproc_eeg.edf 
         └─ sub-001_task-listening_desc-preproc_eeg.json

Contents of the descriptions.tsv file:

desc_id	description
desc-Filt	low-pass filtered at 30Hz
desc-FiltDs	low-pass filtered at 30Hz, downsampled to 250Hz
desc-preproc	low-pass filtered at 30Hz, downsampled to 250Hz, and rereferenced to a common average reference