Exome-by-phenome-wide rare variant gene burden association with electronic health record phenotypes
Anastasia M Lucas,
Renae L Judy,
Rachel L Kember,
Regeneron Genetics Center,
Scott M Damrauer,
Marylyn D. Ritchie,
Daniel J. Rader
Posted 15 Oct 2019
bioRxiv DOI: 10.1101/798330 (published DOI: 10.1038/s41591-020-1133-8)
Posted 15 Oct 2019
Background: By coupling large-scale DNA sequencing with electronic health records (EHR), "genome-first" approaches can enhance our understanding of the contribution of rare genetic variants to disease. Aggregating rare, loss-of-function variants in a candidate gene into a "gene burden" to test for association with EHR phenotypes can identify both known and novel clinical implications for the gene in human disease. However, this methodology has not yet been applied on both an exome-wide and phenome-wide scale, and the clinical ontologies of rare loss-of-function variants in many genes have yet to be described. Methods: We leveraged whole exome sequencing (WES) data in participants (N=11,451) in the Penn Medicine Biobank (PMBB) to address on an exome-wide scale the association of a burden of rare loss-of-function variants in each gene with diverse EHR phenotypes using a phenome-wide association study (PheWAS) approach. For discovery, we collapsed rare (minor allele frequency (MAF) ≤ 0.1%) predicted loss-of-function (pLOF) variants ( i.e. frameshift insertions/deletions, gain/loss of stop codon, or splice site disruption) per gene to perform a gene burden PheWAS. Subsequent evaluation of the significant gene burden associations was done by collapsing rare (MAF ≤ 0.1%) missense variants with Rare Exonic Variant Ensemble Learner (REVEL) scores ≥ 0.5 into corresponding yet distinct gene burdens, as well as interrogation of individual low-frequency to common (MAF > 0.1%) pLOF variants and missense variants with REVEL ≥ 0.5. We replicated our findings using the UK Biobank's (UKBB) whole exome sequence dataset (N=49,960). Results: From the pLOF-based discovery phase, we identified 106 gene burdens with phenotype associations at p<10-6 from our exome-by-phenome-wide association studies. Positive-control associations included TTN (cardiomyopathy, p=7.83E-13), MYBPC3 (hypertrophic cardiomyopathy, p=3.48E-15), CFTR (cystic fibrosis, p=1.05E-15), CYP2D6 (adverse effects due to opiates/narcotics, p=1.50E-09), and BRCA2 (breast cancer, p=1.36E-07). Of the 106 genes, 12 gene-phenotype relationships were also detected by REVEL-informed missense-based gene burdens and 19 by single-variant analyses, demonstrating the robustness of these gene-phenotype relationships. Three genes showed evidence of association using both additional methods ( BRCA1, CFTR, TGM6 ), leading to a total of 28 robust gene-phenotype associations within PMBB. Furthermore, replication studies in UKBB validated 30 of 106 gene burden associations, of which 12 demonstrated robustness in PMBB. Conclusion: Our study presents 12 exome-by-phenome-wide robust gene-phenotype associations, which include three proof-of-concept associations and nine novel findings. We show the value of aggregating rare pLOF variants into gene burdens on an exome-wide scale for unbiased association with EHR phenotypes to identify novel clinical ontologies of human genes. Furthermore, we show the significance of evaluating gene burden associations through complementary, yet non-overlapping genetic association studies from the same dataset. Our results suggest that this approach applied to even larger cohorts of individuals with WES or whole-genome sequencing data linked to EHR phenotype data will yield many new insights into the relationship of genetic variation and disease phenotypes.
- Downloaded 962 times
- Download rankings, all-time:
- Site-wide: 25,181
- In genomics: 2,354
- Year to date:
- Site-wide: 32,686
- Since beginning of last month:
- Site-wide: 51,723
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!