CMG analysts will work together with collaborators using the seqr framework to identify strong candidate genes/variants in two stages. The following analytic approach is applied to exome and whole genome Next Generation Sequencing data.

The initial analysis will aim to identify candidate variants in known genes associated with the primary disease/phenotype. The majority of filters listed below are relaxed to increase sensitivity to detect potential variants in these known genes.

The subsequent analysis will attempt to identify rare, likely candidates in genes not previously associated with disease by applying filters to exclude unlikely variants:

Frequency Filters 
Allele frequencies from the Genome Aggregation Database (gnomAD) and 1000 Genomes will be used to exclude common variants. An additional popmax filter can be used to further exclude variants that are common in any given population. We typically refer to a variant as rare if it occurs in less than 1% of these populations. 

Inheritance Patterns 
We interrogate recessive (including homozygous, compound heterozygous, and x-linked), dominant, and de novo modes of inheritance where approporiate cased on family structure. In complicated cases, we can use custom inheritance filters to explcitly specify individual genotypes. For more information about the specific criteria for each of these searches, please see the documentation on seqr.

Functional Annotation
The Variant Effect Predictor tool was used to annotate variants and can be used to filter based on functional annotations using Sequence Ontology terms. The following functional classes are considered in a moderate to high impact search: Nonsense, Essential splice site, Missense, Frameshift, and In frame variants. In addition, deleterious predictions from Polyphen, SIFT, MutationTaster, and FATHMM can be used as a tentative guide to further prioritize candidate variants. Lastly, all genes/variants in ClinVar and OMIM will be indicated.

Genes and Regions
Similar to the initial analysis, searches can be restricted to a set of candidate genes or genomic loci previously associated with the disease/phenotype. These searches are typically performed by applying additional filters.

Quality Filters
All variants detected by the Genome Analysis Toolkit (GATK) based pipeline have associated quality metrics for both the variant site and for each individual genotype that can be filtered.

All strong candidates from both stages of analysis will be tagged and classified according to the American College of Medical Genetics and Genomics (ACMG) standards and guidelines. Further analysis will be performed specific to each collaboration agreement. A summary of strong candidates will be available to the collaborator through the seqr interface.

Copy Number Variation

Copy number variation (CNV) analysis will be conducted on exome sequencing data using ExomeDepth and GATK gCNV. Genome sequencing data will be processed with GenomeSTRiP to identify CNVs and Manta to identify both CNVs and structural variants.