Amlie-Wolf et al., Inferno - INFERing the molecular mechanisms of NOncoding genetic variants. bioRXiv. 2017 Oct. doi:10.1101/211599. Epub 2017 Oct 30.
Abstract: The majority of variants identified by genome-wide association studies (GWAS) reside in the noncoding genome, where they affect regulatory elements including transcriptional enhancers. We propose INFERNO (INFERring the molecular mechanisms of NOncoding genetic variants), a novel method which integrates hundreds of diverse functional genomics data sources with GWAS summary statistics to identify putatively causal noncoding variants underlying association signals. INFERNO comprehensively infers the relevant tissue contexts, target genes, and downstream biological processes affected by causal variants. We apply INFERNO to schizophrenia GWAS data, recapitulating known schizophrenia-associated genes including CACNA1C and discovering novel signals related to transmembrane cellular processes.
- Whole genome sequence analysis of Caribbean Hispanic families with late onset Alzheimer's disease. American Society of Human Genetics. 2017 Oct.
Abstract: Alzheimer disease, the most common form of dementia, lacks effective treatment and is an enormous emotional and financial burden to society. Genetic research in early onset forms of the disease were the basis for current treatment strategies, but genome-wide association and exome sequencing studies of late onset disease have identified several novel susceptibility loci clustering in unique pathways that could lead to additional approaches. In this study we aim to determine whether family-based sequencing can provide a comprehensive and detailed knowledge of rare genetic variation leading to late onset Alzheimer’s disease.
- Naj et al., Genome-wide rare variant imputation and tissue-specific transcriptomic analysis identify novel rare variant candidate loci in late onset Alzheimer's disease: The Alzheimer's Disease Genetics Consortium. American Society of Human Genetics. 2017 Oct.
Abstract: Background: The International Genomics of Alzheimer’s Project (IGAP) genome-wide association study (GWAS) identified 19 susceptibility LOAD loci in addition to APOE, however the majority of these were common (minor allele frequency (MAF)>0.05). The Haplotype Reference Consortium (HRC) released a dense reference panel (64,976 haplotypes/39,235,157 SNPs) allowing imputation of rare variants (MAF>0.00008) for discovery association testing. ADGC imputed 33 GWAS datasets to HRC to identify novel rare variant associations and genetically-regulated gene expression patterns contributing to LOAD. (75 words)
Methods: We imputed 14,743 cases and 15,871 controls to the HRC r1.1 reference panel using Minimac3 on the University of Michigan Imputation Server. Logistic regression on individual variants with MAF>0.01 was performed in PLINKv1.9 (generalized linear mixed model in R for family-based variants) using imputed genotype probabilities and meta-analyzed in METAL, while variants with MAF≤0.01 were analyzed using score-based tests and meta-analyzed in the SeqMeta/R package; both analyses adjusted for age, sex, and population substructure. Gene-based association was also performed using SKAT-O and gene-based testing of expression regulation in LOAD was performed using PrediXcan. (93 words)
Results: Preliminary analyses of ~39.2 million genotyped or imputed SNVs identified single variant associations of P<5×10-8 in 5 known IGAP LOAD candidate loci (APOE, BIN1, the MS4A region, PICALM, and CR1), and multiple suggestive associations (P<10-5 ) at each of an additional 13 loci (including known IGAP loci), rs13155750 in MEF2C (OR(95% CI): 1.13(1.08,1.20); P=5.09×10-6 ) and rs755951 in PTK2B (OR(95% CI): 1.11(1.06,1.16); P=5.61×10-6 ). Novel associations include signals at LILRA5 (19:54821819; OR(95% CI): 1.14(1.08,1.20); P=4.84×10-7 ), involved in innate immunity pathways; and at SMOX (rs1884732; OR(95% CI): 1.11(1.06,1.17); P=5.17×10-6 ), involved in catabolism of polyamines, levels of which are altered in AD brains. Rare variant and gene-based analyses demonstrated significant APOE and TREM2 associations, while gene-based testing identified strong but marginal association for SORL1 (P=5.55×10-6 ). PrediXcan analyses identified significant strong genetically-regulated gene expression in LOAD for MS4A4A (Q=4.26×10-26), BIN1 (Q=1.70×10-18), and FBXO46 (Q=4.25×10-8 ). (138 words)
Conclusions: Several novel candidate loci for LOAD have been identified using high-quality imputation of rare and low-frequency variants in the ADGC, reinforcing the utility of high-density imputation panels, and providing a resource to newly identify genes with perturbed expression in LOAD.
- Beecham et al., Whole-genome sequencing in non-Hispanic white familial late-onset Alzheimer's disease identifies rare variation in AD candidate genes. 2017 Oct.
Abstract: The Alzheimer’s Disease Sequencing Project (ADSP) is an initiative to identify rare genetic variation influencing Late Onset Alzheimer’s Disease (LOAD) risk. As part of the ADSP, we performed wholegenome sequencing (WGS) in 44 non-Hispanic white (NHW) extended families multiply affected by LOAD, followed by extensive quality control, variant filtering, and gene-based association tests. WGS data were generated for 197 persons from 44 NHW families, including both AD and cognitively intact relatives. Alignment was performed using the Burrows-Wheeler algorithm, followed by genotype calling using a consensus calling pipeline that used both GATK genotype calls and ATLAS genotype calls. Variants were annotated for allele frequency and predicted functional impact, categorized into damaging (e.g., loss of function, high CADD scores, etc) and likely damaging (e.g., non-synonymous, moderate CADD, etc). We performed gene-based association testing of 32 known AD candidate genes, accounting for family structure using the FSKAT software. Association was performed using rare variants (MAF<0.01) and two models (damaging set only; damaging and likely damaging). Examination of the 32 known LOAD and EOAD genes (largely identified by GWAS-based meta-analyses) confirmed the role of rare functional variation in a number of genes, including FERMT2 (p =0.001) and SLC24A4 (p=0.009 in NHW). The association in FERMT2 survives a Bonferroni correction for 32 genes tested (p-value = 0.05/32 = 0.00156). Both genes still showed association after adjusting for age, sex, and principal components (FERMT2 p-value=0.002; SLC24A4 p-value=0.023). The PICALM gene also showed nominal association after adjusting for age, sex, and principal components (p-value = 0.032; unadjusted p-value = 0.111). SLC24A4 also showed nominal association in the damaging variant only analysis (p-value=0.026). These three genes still showed evidence of association after including index SNP genotypes as covariates (FERMT2 p-value=0.002, SLC24A4=0.015, PICALM p-value=0.021). This indicates that the associated variation in the genes is likely independent of the common variants that initially implicated the genes. These results suggest rare, functional variation may influence LOAD risk in multiplex families, even among genes identified through common variation. Variants are currently being validated using other technologies, and follow-up and replication analyses are ongoing.
- Schellenberg et al., Large-Scale DNA sequence analysis and Alzheimer's disease genetics. International Conference on Alzheimer's and Parkinson's Diseases. 2017 Mar.
Abstract: A substantial amount of the heritability of Alzheimer’s disease (AD) remains to be explained. Early family studies lead to identification of rare mutations in 3 genes (APP, PSEN1 and PSEN2), and common variants in APOE. Subsequent work using high-density genotyping arrays identified over 30 common variants loci. The advent of low-cost high-throughput DNA sequencing makes it possible to identify additional rare single nucleotide variants (SNVs) in risk and protective genes. Study designs include studies of multiplex late-onset AD kindreds and larger case-control samples. The Alzheimer’s Disease Sequence Project (ADSP) generated whole genome sequence (WGS) from 1,000 family members and exome sequence data from 6,000 AD cases and 5,000 elderly normal controls. Other efforts are generating additional WGS and WES data relevant to AD. While analyses of these large data sets may yield AD risk genes, to identify rare variants of modest effect size will require much larger data sets. While WGS/WES costs are declining, obtaining genetic data from very large samples will require use of high-density imputation panels to follow up candidates from sequencing experiments. In addition to SNVs, it is now possible to derive high-quality structural variants (SVs; indels. Insertions, deletions, copy number variation, and chromosome-level alterations) from sequence data. SVs, particularly short indels and variants <5000bp have not been tested in most disease studies including AD. SVs account for a large part of genetic variation in human. By using multiple analytic approaches and technologies for detecting genetic variation, we hope to resolve the genetics of AD more-completely.
- Wang, Li-San. Role and resources of National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site and Genome Center for Alzheimer's Disease. International Conference on Alzheimer's and Parkinson's Diseases. 2017 Mar.
Abstract: National Institute on Aging Genetics of Alzheimer’s Disease Data Storage Site (NIAGADS) is a national genetics repository created by NIA to facilitate access by qualified investigators to genotypic data for the study of genetics of late-onset Alzheimer's disease. The Genome Center for Alzheimer's Disease (GCAD) coordinates the integration and meta-analysis of all available Alzheimer’s disease (AD) relevant genetic data with the goal of identifying AD risk/causative/protective genetic variants and eventual therapeutic targets. NIAGADS and GCAD support the Alzheimer's Disease Sequencing Project by collecting and harmonizing genomics data, and developing databases and portals for genomic information retrieval. In this talk I will introduce both initiatives, their roles in ADSP, and how we can help investigators access ADSP data and resources.
- Haines J et al., In Silico functional annotation of genomic variants and multi-gene analyses in late onset Alzheimer's disease. International Conference on Alzheimer's and Parkinson's Diseases. 2017 Mar.
Abstract: Objectives: A major difficulty in working with any sequencing data is providing the potential functional consequences of the identified variants. Our goal is to provide consistent genomic annotations for Alzheimer disease (AD) sequencing data integrating data from a large set of diverse databases of functional impacts. In addition, these data will be used to focus multi-gene analyses of the detected variants.
Methods: We employed a strategy of integrating in silico functional information and applying it to large scale whole-exome and whole-genome sequencing efforts, We developed a workflow to provide investigators with predicted functional impact (from the Ensembl Variant Effect Predictor), variant allele frequencies observed in other studies (from the Kaviar database and the Wellderly Cohort), predicted loss-of-function status (from SNPEff), and multiple scoring metrics for assessment deleteriousness (including CADD, CATO, and SPIDEX scores).
Results: We annotated over 28 million variants identified from >12,000 AD cases and controls. Of these, approximately 5 million are novel events not reported in multiple reference databases. Because the incredible depth of available data makes annotation of non-coding regions especially challenging, we developed approaches to collapse and combine gene regulatory annotations and (when possible) to assign them to downstream genes. Using these annotations for multi-gene analyses is currently underway.
Conclusions: We constructed computational pipelines to generate detailed functional annotations for both coding and non-coding variants that enable hypothesis-driven analyses, and ultimately provide new insights into the pathogenesis in AD.
- Pericak-Vance M et al., Family-based analyses of whole genome sequencing in white, non-Hispanic populations. International Conference on Alzheimer's and Parkinson's Diseases. 2017 Mar.
Abstract: Objectives: The Alzheimer’s Disease Sequencing Project (ADSP) is an initiative to identify genetic variation influencing risk in late-onset AD (LOAD) with whole-genome sequencing (WGS) on 229 subjects from 42 non-Hispanic White (NHW) extended AD families. We analyzed these data to identify putative risk variants co-segregating with disease.
Methods: Standard bioinformatics protocols were applied, with multiple genotype callers used to develop consensus. Variants were annotated for function, frequency, segregation with disease, and with enhancer and expression QTL data. We examined segregation under consensus and family-specific linkage peaks, as well as within known AD candidate genes.
Results: Within the two consensus linkage regions we identified 32 rare (MAF<0.01) SNVs segregating with disease, were absent from all cognitively normal individuals, and were putatively functional (CADD score > 10). Within the family-specific linkage regions we identified 12 SNVs segregating , putatively functional variants, including missense SNVs in TTC3 (CADD=32) and FSIP2 (CADD=25). We also identified 26 variants that were within known candidate genes and co-segregated in >75% of AD patients, were rare (MAF<5%), and putatively functional (CADD score > 15). The candidate loci include APP, PICALM, PSEN1, GRN, MS4A6A and MEF2C. Analysis of enhancer data identified multiple enhancer SNVs that segregate with disease and may influence gene expression.
Conclusions: This study shows the power of segregation-based family designs in WGS studies of complex diseases like AD, and suggests TTC3 and FSIP2 as AD risk genes. In addition, rare variation in previously identified candidate genes may play a role in familial LOAD risk.
- Farrer L et al., Novel genetic variants and loci influencing risk of Alzheimer's disease identified by whole exome sequencing using an enriched case-control design. International Conference on Alzheimer's and Parkinson's Diseases. 2017 Mar.
Abstract: Objectives: To test the hypothesis that AD cases who have close relatives also affected by the disease (“enriched cases”) are more likely than other AD cases to have AD risk variants, we evaluated the association of AD with variants identified by whole exome sequencing (WES) in samples of unrelated non-Hispanic whites (507 enriched cases, 4,917 controls) and Carribean Hispanics (172 enriched cases, 177 controls) included in the Alzheimer’s Disease Sequencing Project.
Methods: WES data were submitted to a bioinformatics pipeline that included consensus genotype calling of single nucleotide variants (SNVs) and small indels using GATK and ATLAS protocols, and evaluation of cryptic relatedness and differential missingness. Associations were tested using the score test for individual variants and SKAT-O for gene-based tests, adjusting for age, sex, and principal components of ancesty.
Results: We identified significant association with three SNVs near APOE, the previously established AD TREM2-R47H variant, and SNVs in PRSS1, SORBS1, NUFIP1, WDR59 and PKD1L2. Significant associations were also observed with small indels in SHKBP1, ZNF718, ZNF595, and TUBB4Q. Gene-based tests considering only highly deleterious variants revealed significant associations with CD22, PHTF1, PRSS1, SLC38A10, and TMEM82. Novel gene-based associations with DNPH1, FOXD4L1, IGHV3-64, PLCL2, and RPL19 were observed in tests considering high or moderate-effect variants.
Conclusions: This study identified significant AD associations with several novel genes. These findings suggest that persons in families with multiple cases are likely to harbor rare highly penetrant AD risk variants and that studies of enriched cases will help delineate mechanisms leading to AD.
- Mayeux et al., Whole exome and whole genome sequencing in Caribbean Hispanics families. International Conference on Alzheimer's and Parkinson's Diseases. 2017 Mar.
Abstract: While common variants at the APOE locus can influence the risk of late onset Alzheimer’s disease (LOAD), rare coding variants may also alter risk. Families multiply affected by LOAD from inbred, island populations can be enriched for such variants. We have investigated Caribbean Hispanic families from the Dominican Republic multiply affected by LOAD to identify novel coding variants. We used two experimental approaches: 1) to detect rare coding variants underlying loci detected by genome-wide association studies (GWAS) we conducted targeted sequencing of ABCA7, BIN1, CD2AP, CLU, CR1, EPHA1, MS4A4A/MS4A6A, SORL1 and PICALM in three independent LOAD cohorts; 2) Whole exome sequencing was also completed in 31 Caribbean Hispanic families without known mutations (e.g. APP, PSEN1 or 2) or APOEε4 homozygous carriers. In the first experiment, a statistically significant 3.1-fold enrichment of the nonsynonymous mutations was found among LOAD cases compared to controls with no difference in synonymous variants. Mutations were identified in ABCA7, CD2AP, EPHA1, SORL1 and BIN1. The EPHA1 variant segregated completely in an extended Caribbean Hispanic family. In the second experiment, rare missense mutations in the Snf2-related CREB binding protein activator, SRCAP, were found in eight unrelated families. In both experiments the frequency of these variants were significantly greater in the affected than in the unaffected family members and significantly different from the frequency found in the Exome Aggregation Exchange for the Latino population. High throughput sequencing of an inbred, island population can reveal an excess burden of deleterious coding mutations in LOAD. Identifying coding variants in LOAD will facilitate the creation of tractable models for investigation of disease-related mechanisms and potential therapies.
- Kunkle BW et al., Genome-wide linkage analyses of non-Hispanic white families identify novel loci for familial late-onset Alzheimer's disease. Alzheimers Dement. 2016 Jan;12(1):2-10. doi: 10.1016/j.jalz.2015.05.020. Epub 2015 Sep 11. PMID: 26365416.
Abstract: INTRODUCTION: Few high penetrance variants that explain risk in late-onset Alzheimer's disease (LOAD) families have been found. METHODS: We performed genome-wide linkage and identity-by-descent (IBD) analyses on 41 non-Hispanic white families exhibiting likely dominant inheritance of LOAD, and having no mutations at known familial Alzheimer's disease (AD) loci, and a low burden of APOE ε4 alleles. RESULTS: Two-point parametric linkage analysis identified 14 significantly linked regions, including three novel linkage regions for LOAD (5q32, 11q12.2-11q14.1, and 14q13.3), one of which replicates a genome-wide association LOAD locus, the MS4A6A-MS4A4E gene cluster at 11q12.2. Five of the 14 regions (3q25.31, 4q34.1, 8q22.3, 11q12.2-14.1, and 19q13.41) are supported by strong multipoint results (logarithm of odds [LOD*] ≥1.5). Nonparametric multipoint analyses produced an additional significant locus at 14q32.2 (LOD* = 4.18). The 1-LOD confidence interval for this region contains one gene, C14orf177, and the microRNA Mir_320, whereas IBD analyses implicates an additional gene BCL11B, a regulator of brain-derived neurotrophic signaling, a pathway associated with pathogenesis of several neurodegenerative diseases. DISCUSSION: Examination of these regions after whole-genome sequencing may identify highly penetrant variants for familial LOAD.
- Barral S et al., Linkage analyses in Caribbean Hispanic families identify novel loci associated with familial late-onset Alzheimer's disease. Alzheimers Dement. 2015 Dec;11(12):1397-406. doi: 10.1016/j.jalz.2015.07.487. Epub 2015 Oct 1. PMID: 26433351.
Abstract: INTRODUCTION: We performed linkage analyses in Caribbean Hispanic families with multiple late-onset Alzheimer's disease (LOAD) cases to identify regions that may contain disease causative variants. METHODS: We selected 67 LOAD families to perform genome-wide linkage scan. Analysis of the linked regions was repeated using the entire sample of 282 families. Validated chromosomal regions were analyzed using joint linkage and association. RESULTS: We identified 26 regions linked to LOAD (HLOD ≥3.6). We validated 13 of the regions (HLOD ≥2.5) using the entire family sample. The strongest signal was at 11q12.3 (rs2232932: HLODmax = 4.7, Pjoint = 6.6 × 10(-6)), a locus located ∼2 Mb upstream of the membrane-spanning 4A gene cluster. We additionally identified a locus at 7p14.3 (rs10255835: HLODmax = 4.9, Pjoint = 1.2 × 10(-5)), a region harboring genes associated with the nervous system (GARS, GHRHR, and NEUROD6). DISCUSSION: Future sequencing efforts should focus on these regions because they may harbor familial LOAD causative mutations.