Data Sharing


The National Institutes of Health (NIH) has an established central data repository called the database of Genotypes and Phenotypes (dbGaP) for securely storing and sharing human data submitted to NIH under the Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome­Wide Association Studies (GWAS). Implicit in the establishment of dbGaP is that scientific progress in genomic research will be greatly enhanced if the data are readily available to all scientific investigators and shared in a manner consistent with the research participants’ informed consent.

Controlled­-access data in dbGaP can only be obtained if a user has been authorized by the appropriate Data Access Committee (DAC). Information on requesting controlled data access, is available on the NIH website. Data available to authorized investigators may include de­identified phenotypes and genotypes for individual study subjects, pedigrees, and pre­computed univariate associations between genotype and phenotype (if not made available on the public site).

All data generated in the Broad CMG will be deposited to dbGaP within a year of data generation. To request access to the data, please visit our dbGaP site


The Matchmaker Exchange (MME) is a federated network of genomic centers that share their patient data to help find the underlying genetic changes that cause rare disease. The goal of this network is to help aggregate data of patients with similar disorders from around the world. These centers connect to each other through a  common application programming interface (API) that uses a mutually agreed upon data format.

We built the matchbox software application to serve as our primary bridge to the MME. Building a MME server is a resource intensive process, and a limiting factor to a new center interested in joining this data-sharing network. To help such an institution, we made matchbox open source, easy to install, and free to use.

Our collaborators deposit various types of patient data such as phenotypes into our seqr web application, which is our central data aggregation and analysis platform. seqr is a web-based technology platform that allows the collaborative analysis of genomic data aggregated by family. It allows users to annotate, comment, search, and visualize, genomic data. This helps our collaborators and data contributors from around the world to efficiently conduct analysis and exchange information. 

Once in seqr and phenotypes having been entered, and one or more candidate genes identified, families can then be “matched” to find other patients around the world that have similar disorders. This aggregation of similar cases helps users to build evidence for gene causality and tremendously energize novel gene discovery.

More information can be found on and in a special issue of Human Mutation  (­10/issuetoc)

matchbox can be downloaded from github. Please contact for any questions.


ClinVar is a publicly available database of genomic variation and its relationship to human health, maintained by the National Center for Biotechnology Information (NCBI) and funded by the Intramural Research Program of the National Institutes of Health (NIH), National Library of Medicine. ClinVar catalogs and aggregates variant submission with their reported clinical significance and supporting information, when available. ClinVar adds value to submitted interpretations by standardizing descriptions of variants, conditions, and terms for clinical significance. This information is made publicly available through ClinVar for use in the healthcare community.

Key ClinVar facts:

  • ClinVar is fully public and freely available.
  • ClinVar is a submission-driven database that holds both primary submissions and expert-curated submissions. The scope of the submission may be as small as a single variant.
  • ClinVar welcomes submissions from clinical testing labs, researchers, locus-specific databases, expert panels, and professional societies.
  • ClinVar adds value to submitted interpretations by standardizing descriptions of variants, conditions, and terms for clinical significance.
    • Variants are mapped to reference sequences and reported in HGVS.
    • Conditions are mapped to concepts in MedGen.
    • Clinical significance terms for Mendelian disorders are reported by ACMG categories.
    • Following variant submission, ClinVar provides a conflict report of any differences in interpretation between their submitted variants and those already in ClinVar.
  • More information on ClinVar is available at