Rxivist logo

Exome-by-phenome-wide rare variant gene burden association with electronic health record phenotypes

By Joseph Park, Nathan Katz, Xinyuan Zhang, Anastasia M Lucas, Anurag Verma, Renae L Judy, Rachel L Kember, Regeneron Genetics Center, Jinbo Chen, Scott M. Damrauer, Marylyn D. Ritchie, Daniel J Rader

Posted 15 Oct 2019
bioRxiv DOI: 10.1101/798330 (published DOI: 10.1038/s41591-020-1133-8)

Background: By coupling large-scale DNA sequencing with electronic health records (EHR), "genome-first" approaches can enhance our understanding of the contribution of rare genetic variants to disease. Aggregating rare, loss-of-function variants in a candidate gene into a "gene burden" to test for association with EHR phenotypes can identify both known and novel clinical implications for the gene in human disease. However, this methodology has not yet been applied on both an exome-wide and phenome-wide scale, and the clinical ontologies of rare loss-of-function variants in many genes have yet to be described. Methods: We leveraged whole exome sequencing (WES) data in participants (N=11,451) in the Penn Medicine Biobank (PMBB) to address on an exome-wide scale the association of a burden of rare loss-of-function variants in each gene with diverse EHR phenotypes using a phenome-wide association study (PheWAS) approach. For discovery, we collapsed rare (minor allele frequency (MAF) ≤ 0.1%) predicted loss-of-function (pLOF) variants ( i.e. frameshift insertions/deletions, gain/loss of stop codon, or splice site disruption) per gene to perform a gene burden PheWAS. Subsequent evaluation of the significant gene burden associations was done by collapsing rare (MAF ≤ 0.1%) missense variants with Rare Exonic Variant Ensemble Learner (REVEL) scores ≥ 0.5 into corresponding yet distinct gene burdens, as well as interrogation of individual low-frequency to common (MAF > 0.1%) pLOF variants and missense variants with REVEL ≥ 0.5. We replicated our findings using the UK Biobank's (UKBB) whole exome sequence dataset (N=49,960). Results: From the pLOF-based discovery phase, we identified 106 gene burdens with phenotype associations at p<10-6 from our exome-by-phenome-wide association studies. Positive-control associations included TTN (cardiomyopathy, p=7.83E-13), MYBPC3 (hypertrophic cardiomyopathy, p=3.48E-15), CFTR (cystic fibrosis, p=1.05E-15), CYP2D6 (adverse effects due to opiates/narcotics, p=1.50E-09), and BRCA2 (breast cancer, p=1.36E-07). Of the 106 genes, 12 gene-phenotype relationships were also detected by REVEL-informed missense-based gene burdens and 19 by single-variant analyses, demonstrating the robustness of these gene-phenotype relationships. Three genes showed evidence of association using both additional methods ( BRCA1, CFTR, TGM6 ), leading to a total of 28 robust gene-phenotype associations within PMBB. Furthermore, replication studies in UKBB validated 30 of 106 gene burden associations, of which 12 demonstrated robustness in PMBB. Conclusion: Our study presents 12 exome-by-phenome-wide robust gene-phenotype associations, which include three proof-of-concept associations and nine novel findings. We show the value of aggregating rare pLOF variants into gene burdens on an exome-wide scale for unbiased association with EHR phenotypes to identify novel clinical ontologies of human genes. Furthermore, we show the significance of evaluating gene burden associations through complementary, yet non-overlapping genetic association studies from the same dataset. Our results suggest that this approach applied to even larger cohorts of individuals with WES or whole-genome sequencing data linked to EHR phenotype data will yield many new insights into the relationship of genetic variation and disease phenotypes.

Download data

  • Downloaded 1,143 times
  • Download rankings, all-time:
    • Site-wide: 31,185
    • In genomics: 2,571
  • Year to date:
    • Site-wide: 101,833
  • Since beginning of last month:
    • Site-wide: 101,635

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide