Skip to content

Genetic Descriptor

Support for genetic descriptors was developed as a BIDS Extension Proposal. Please see Citing BIDS on how to appropriately credit this extension when referring to it in the context of the academic literature.

Genetic data are typically stored in dedicated repositories, separate from imaging data. A genetic descriptor links a BIDS dataset to associated genetic data, potentially in a separate repository, with details of where to find the genetic data and the type of data available.

Example datasets

The following example dataset with genetics data have been formatted using this specification and can be used for practical guidance when curating a new dataset.

Dataset Description

If information on associated genetic data is supplied as part of a BIDS dataset, these "genetic descriptors" are encoded as an additional, REQUIRED entry in the dataset_description.json file.

Datasets linked to a genetic database entry include the following REQUIRED and OPTIONAL keys in the Genetics sub-object of dataset_description.json:

Key name Requirement Level Data type Description
Dataset REQUIRED string URI where data can be retrieved.
Database OPTIONAL string URI of database where the dataset is hosted.
Descriptors OPTIONAL string or array of strings List of relevant descriptors (for example, journal articles) for dataset using a valid URI when possible.

Example:

{
  "Name": "Human Connectome Project",
  "BIDSVersion":  "1.3.0",
  "License": "CC0",
  "Authors": ["1st author", "2nd author"],
  "Funding": ["P41 EB015894/EB/NIBIB NIH HHS/United States"],
  "Genetics": {
     "Dataset": "https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001364.v1.p1",
     "Database": "https://www.ncbi.nlm.nih.gov/gap/",
     "Descriptors": ["doi:10.1016/j.neuroimage.2013.05.041"]
  }
}

Subject naming and Subjects file

If the same subjects have different identifiers in the genetic and imaging datasets, the column genetic_id SHOULD be added to the subjects.tsv file to associate the BIDS subject with a subject in the Genetics.Dataset referred to in the dataset_description.json file.

Information about the presence/absence of specific genetic markers MAY be duplicated in the subjects.tsv file by adding optional columns (like idh_mutation in the example below). Note that optional columns MUST be further described in an accompanying subjects.json file as described in Tabular files.

subjects.tsv example:

subject_id  age sex group   genetic_id  idh_mutation
sub-control01   34  M   control 124587  yes
sub-control02   12  F   control 548936  yes
sub-patient01   33  F   patient 489634  no

Genetic Information

Template:

genetic_info.json

The following fields are defined for genetic_info.json:

The genetic_info.json file describes the genetic information available in the subjects.tsv file and/or the genetic database described in dataset_description.json.

Datasets containing the Genetics field in dataset_description.json or the genetic_id column in subjects.tsv MUST include this file.

Key name Requirement Level Data type Description
GeneticLevel REQUIRED string or array of strings Describes the level of analysis. Values MUST be one of "Genetic", "Genomic", "Epigenomic", "Transcriptomic", "Metabolomic", or "Proteomic".
For more information on these levels, see Multi-omics approaches to disease by Hasin et al. 2017.
AnalyticalApproach OPTIONAL string or array of strings Methodology or methodologies used to analyze the "GeneticLevel". Values MUST be taken from the database of Genotypes and Phenotypes (dbGaP) under /Study/Molecular Data Type (for example, SNP Genotypes (Array) or Methylation (CpG).
SampleOrigin REQUIRED string Describes from which tissue the genetic information was extracted.

Must be one of: "blood", "saliva", "brain", "csf", "breast milk", "bile", "amniotic fluid", "other biospecimen".
TissueOrigin OPTIONAL string Describes the type of tissue analyzed for "SampleOrigin" brain.

Must be one of: "gray matter", "white matter", "csf", "meninges", "macrovascular", "microvascular".
BrainLocation OPTIONAL string Refers to the location in space of the "TissueOrigin". Values may be an MNI coordinate, a label taken from the Allen Brain Atlas, or layer to refer to layer-specific gene expression, which can also tie up with laminar fMRI.
CellType OPTIONAL string Describes the type of cell analyzed. Values SHOULD come from the cell ontology.

To ensure dataset description consistency, we recommend following Multi-omics approaches to disease by Hasin et al. 2017 to determine the GeneticLevel:

  • Genetic: data report on a single genetic location (typically directly in the subjects.tsv file)
  • Genomic: data link to subjects' genome (multiple genetic locations)
  • Epigenomic: data link to subjects' characterization of reversible modifications of DNA
  • Transcriptomic: data link to subjects RNA levels
  • Metabolomic: data link to subjects' products of cellular metabolic functions
  • Proteomic: data link to subjects peptides and proteins quantification

genetic_info.json example:

{
  "GeneticLevel": "Genomic",
  "AnalyticalApproach": ["Whole Genome Sequencing", "SNP/CNV Genotypes"],
  "SampleOrigin": "brain",
  "TissueOrigin": "gray matter",
  "CellType":  "neuron",
  "BrainLocation": "[-30 -15 10]"
}