Combining SNP-to-gene linking strategies to pinpoint disease genes and assess disease omnigenicity
Jesse M. Engreitz,
Alkes L Price
Posted 05 Aug 2021
medRxiv DOI: 10.1101/2021.08.02.21261488
Posted 05 Aug 2021
Although genome-wide association studies (GWAS) have identified thousands of disease-associated common SNPs, these SNPs generally do not implicate the underlying target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis, but it is unclear how these strategies should be applied in the context of interpreting common disease risk variants. We developed a framework for evaluating and combining different S2G strategies to optimize their informativeness for common disease risk, leveraging polygenic analyses of disease heritability to define and estimate their precision and recall. We applied our framework to GWAS summary statistics for 63 diseases and complex traits (average N=314K), evaluating 50 S2G strategies. Our optimal combined S2G strategy (cS2G) included 7 constituent S2G strategies (Exon, Promoter, 2 fine-mapped cis-eQTL strategies, EpiMap enhancer-gene linking, Activity-By-Contact (ABC), and Cicero), and achieved a precision of 0.75 and a recall of 0.33, more than doubling the precision and/or recall of any individual strategy; this implies that 33% of SNP-heritability can be linked to causal genes with 75% confidence. We applied cS2G to fine-mapping results for 49 UK Biobank diseases/traits to predict 7,111 causal SNP-gene-disease triplets (with S2G-derived functional interpretation) with high confidence. Finally, we applied cS2G to genome-wide fine-mapping results for these traits (not restricted to GWAS loci) to rank genes by the heritability linked to each gene, providing an empirical assessment of disease omnigenicity; averaging across traits, we determined that the top 200 (1%) of ranked genes explained roughly half of the heritability linked to all genes. Our results highlight the benefits of our cS2G strategy in providing functional interpretation of GWAS findings; we anticipate that precision and recall will increase further under our framework as improved functional assays lead to improved S2G strategies.
- Downloaded 555 times
- Download rankings, all-time:
- Site-wide: 58,450
- In genetic and genomic medicine: 254
- Year to date:
- Site-wide: 10,133
- Since beginning of last month:
- Site-wide: 3,549
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!