Once we receive new sequencing, the samples are processed into CRAMs/gVCFs and stored until ready to joint genotype call into a project level VCF (pVCF). GCAD will generate one pVCF containing all new and previously generated gVCFs once per year. These pVCFs then undergo ADSP QC and are deposited into NIAGADS for the research community for access. The tables below provide an update on what data has been processed.
|Dataset Round||Number of Samples||WGS/WES||Status||CRAM/gVCF Release Date||pVCF Release Date|
All pipelines are co-developed with ADSP investigators.
The SNP/Indel Variant Calling Pipeline and data management tool (VCPA) is the official pipeline used for processing all the WGS/WES data in GCAD. It is a functional equivalent pipeline jointly developed by GCAD/ADSP and CCDG/TOPMed. It outputs a CRAM (after recalibration and indel realignment) as well as a gVCF (generated using GATK haplotypecaller).
Project level VCF is QC-ed via a multi-stage process. 1) pre-QC quality checks are performed, including concordance with GWAS data, sample contamination, relatedness/duplication, and Mendelian inconsistency. 2) Individual genotypes, variants, and samples’ checks (e.g. average read depth, average genotype quality scores, and departure from Hardy-Weinberg Equilibrium) are done next. Variants are flagged when issues arise. 3) Finally, improvements are assessed based on quality with the exclusion of low-quality genotypes, variants, and samples as flagged in the second stage.
To learn more about QC Pipeline please read our publication.
The pipeline generates variant-level assessments of functional impact on genes and genetic regulation. Our pipeline is based upon the Ensembl Variant Effect Predictor, which overlays exon, transcript, and regulatory element information from the Ensembl database to generate all possible consequences (missense, frameshift, splicing, etc) a variant may have. Variant consequences relative to Ensembl/GENCODE transcripts are assigned an impact category (high, moderate, low, etc), and multiple variant scoring approaches are incorporated (CADD, CATO, etc).
Learn more about GCAD annotation pipeline.