GCAD receives sequencing data from ADSP or other collaborators. The WGS/WES data will then be processed into CRAMs, gVCFs and project level VCFs in GRCh38. These data will then be QC-ed and annotated, and shared via NIAGADS to the community.
Raw data from NIAGADS Applying ADSP best practice Population info & functional impact Deposit into NIAGADS

Data Production

Once we receive new sequencing, the samples are processed into CRAMs/gVCFs and stored until ready to joint genotype call into a project level VCF (pVCF). GCAD will generate one pVCF containing all new and previously generated gVCFs once per year. These pVCFs then undergo ADSP QC and are deposited into NIAGADS for the research community for access. The tables below provide an update on what data has been processed.

Dataset Round Number of Samples WGS/WES Status CRAM/gVCF Release Date pVCF Release Date
Release Project name WGS/WES Total Samples Received gVCFs Generated


All pipelines are co-developed with ADSP investigators.

VCPA pipeline

The SNP/Indel Variant Calling Pipeline and data management tool (VCPA) is the official pipeline used for processing all the WGS/WES data in GCAD. It is a functional equivalent pipeline jointly developed by GCAD/ADSP and CCDG/TOPMed. It outputs a CRAM (after recalibration and indel realignment) as well as a gVCF (generated using GATK haplotypecaller).

Figure 1: A) VCPA architecture; B) Dynamic view of job status; C) Pipeline overview.

For more information go to VCPA page on NIAGADS website.

QC pipeline

Diagrams of the Caller-specific QC and Consensus Calling Pipeline, including (a) an overview diagram of the process, (b) details of the caller-specific variant-level QC steps, and (c) details of the post-consensus variant-level QC steps.

Project level VCF is QC-ed via a multi-stage process. 1) pre-QC quality checks are performed, including concordance with GWAS data, sample contamination, relatedness/duplication, and Mendelian inconsistency. 2) Individual genotypes, variants, and samples’ checks (e.g. average read depth, average genotype quality scores, and departure from Hardy-Weinberg Equilibrium) are done next. Variants are flagged when issues arise. 3) Finally, improvements are assessed based on quality with the exclusion of low-quality genotypes, variants, and samples as flagged in the second stage.

To learn more about QC Pipeline please read our publication.

Annotation pipeline

The pipeline generates variant-level assessments of functional impact on genes and genetic regulation. Our pipeline is based upon the Ensembl Variant Effect Predictor, which overlays exon, transcript, and regulatory element information from the Ensembl database to generate all possible consequences (missense, frameshift, splicing, etc) a variant may have. Variant consequences relative to Ensembl/GENCODE transcripts are assigned an impact category (high, moderate, low, etc), and multiple variant scoring approaches are incorporated (CADD, CATO, etc).

Learn more about GCAD annotation pipeline.