Identifying gene sets that are associated to disease can provide valuable biological knowledge, but a fundamental challenge of gene set analyses of GWAS data is linking disease-associated SNPs to genes. Transcriptome-wide association studies (TWAS) can be used to detect associations between the genetically predicted expression of a gene and disease risk, thus implicating candidate disease genes. However, causal disease genes at TWAS-associated loci generally remain unknown due to gene co-regulation, which leads to correlations across genes in predicted expression. We developed a new method, gene co-regulation score (GCSC) regression, to identify gene sets that are enriched for disease heritability explained by the predicted expression of causal disease genes in the gene set. GCSC regresses TWAS chi-square statistics on gene co-regulation scores reflecting correlations in predicted gene expression; GCSC determines that a gene set is enriched for disease heritability if genes with high co-regulation to the gene set have higher TWAS chi-square statistics than genes with low co-regulation to the gene set, beyond what is expected based on co-regulation to all genes. We verified via simulations that GCSC is well-calibrated, and well-powered to identify gene sets that are enriched for disease heritability explained by predicted expression. We applied GCSC to gene expression data from GTEx (48 tissues) and GWAS summary statistics for 43 independent diseases and complex traits (average N=344K), analyzing a broad set of biological pathways and specifically expressed gene sets. We identified many enriched gene sets, recapitulating known biology. For Alzheimer's disease, we detected evidence of an immune basis, and specifically a role for antigen presentation, in analyses of both biological pathways and specifically expressed gene sets. Our results highlight the advantages of leveraging gene co-regulation within the TWAS framework to identify gene sets associated to disease.
- Downloaded 493 times
- Download rankings, all-time:
- Site-wide: 77,127
- In genetics: 3,213
- Year to date:
- Site-wide: 13,024
- Since beginning of last month:
- Site-wide: 23,398
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!