Rxivist logo

Leveraging functional annotation to identify genes associated with complex diseases

By Wei Liu, Mo Li, Wenfeng Zhang, Geyu Zhou, Xing Wu, Jiawei Wang, Qiongshi Lu, Hongyu Zhao

Posted 23 Jan 2019
bioRxiv DOI: 10.1101/529297 (published DOI: 10.1371/journal.pcbi.1008315)

To increase statistical power to identify genes associated with complex traits, a number of transcriptome-wide association study (TWAS) methods have been proposed using gene expression as a mediating trait linking genetic variations and diseases. These methods first predict expression levels based on inferred expression quantitative trait loci (eQTLs) and then identify expression-mediated genetic effects on diseases by associating phenotypes with predicted expression levels. The success of these methods critically depends on the identification of eQTLs, which may not be functional in the corresponding tissue, due to linkage disequilibrium (LD) and the correlation of gene expression between tissues. Here, we introduce a new method called T-GEN ( T ranscriptome-mediated identification of disease-associated G ens with E pigenetic a N notation) to identify disease-associated genes leveraging epigenetic information. Through prioritizing SNPs with tissue-specific epigenetic annotation, T-GEN can better identify SNPs that are both statistically predictive and biologically functional. We found that a significantly higher percentage (an increase of 18.7% to 47.2%) of eQTLs identified by T-GEN are inferred to be functional by ChromHMM and more are deleterious based on their Combined Annotation Dependent Depletion (CADD) scores. Applying T-GEN to 207 complex traits, we were able to identify more trait-associated genes (ranging from 7.7 % to 102%) than those from existing methods. Among the identified genes associated with these traits, T-GEN can better identify genes with high (>0.99) pLI scores compared to other methods. When T-GEN was applied to late-onset Alzheimer’s disease, we identified 96 genes located at 15 loci, including two novel loci not implicated in previous GWAS. We further replicated 50 genes in an independent GWAS, including one of the two novel loci. Author summary TWAS-like methods have been widely applied to understand disease etiology using eQTL data and GWAS results. However, it is still challenging to discriminate the true disease-associated genes from those in strong LD with true genes, which is largely due to the misidentification of eQTLs. Here we introduce a novel statistical method named T-GEN to identify disease-associated genes considering epigenetic information. Compared to current TWAS methods, T-GEN can not only identify eQTLs with higher CADD scores and function potentials in gene-expression imputation models, but also identify more disease-associated genes across 207 traits and more genes with high (>0.99) pLI scores. Applying T-GEN in late-onset Alzheimer’s disease identified 96 genes at 15 loci with two novel loci. Among 96 identified genes, 50 genes were further replicated in an independent GWAS.

Download data

  • Downloaded 1,030 times
  • Download rankings, all-time:
    • Site-wide: 28,519
    • In genetics: 1,224
  • Year to date:
    • Site-wide: 75,279
  • Since beginning of last month:
    • Site-wide: 76,055

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide