Rxivist logo

Multi-ancestry gene-trait connection landscape using electronic health record (EHR) linked biobank data

By Binglan Li, Yogasudha Veturi, Anastasia Lucas, Yuki Bradford, Shefali Setia Verma, Anurag Verma, Joseph Park, Wei-Qi Wei, Qiping Feng, Bahram Namjou, Krzysztof Kiryluk, Iftikhar Kullo, Yuan Luo, Milton Pividori, Hae Kyung Im, Casey S Greene, Marylyn D. Ritchie

Posted 26 Oct 2021
medRxiv DOI: 10.1101/2021.10.21.21265225

Understanding genetic factors of complex traits across ancestry groups holds a key to improve the overall health care quality for diverse populations in the United States. In recent years, multiple electronic health record-linked (EHR-linked) biobanks have recruited participants of diverse ancestry backgrounds; these biobanks make it possible to obtain phenome-wide association study (PheWAS) summary statistics on a genome-wide scale for different ancestry groups. Moreover, advancement in bioinformatics methods provide novel means to accelerate the translation of basic discoveries to clinical utility by integrating GWAS summary statistics and expression quantitative trait locus (eQTL) data to identify complex trait-related genes, such as transcriptome-wide association study (TWAS) and colocalization analyses. Here, we combined the advantages of multi-ancestry biobanks and data integrative approaches to investigate the multi-ancestry, gene-disease connection landscape. We first performed a phenome-wide TWAS on Electronic Medical Records and Genomics (eMERGE) III network participants of European ancestry (N = 68,813) and participants of African ancestry (N = 12,658) populations, separately. For each ancestry group, the phenome-wide TWAS tested gene-disease associations between 22,535 genes and 309 curated disease phenotypes in 49 primary human tissues, as well as cross-tissue associations. Next, we identified gene-disease associations that were shared across the two ancestry groups by combining the ancestry-specific results via meta-analyses. We further applied a Bayesian colocalization method, fastENLOC, to prioritize likely functional gene-disease associations with supportive colocalized eQTL and GWAS signals. We replicated the phenome-wide gene-disease analysis in the analogous Penn Medicine BioBank (PMBB) cohorts and sought additional validations in the PhenomeXcan UK Biobank (UKBB) database, PheWAS catalog, and systematic literature review. Phenome-wide TWAS identified many proof-of-concept gene-disease associations, e.g. FTO-obesity association (p = 7.29e-15), and numerous novel disease-associated genes, e.g. association between GATA6-AS1 with pulmonary heart disease (p = 4.60e-10). In short, the multi-ancestry, gene-disease connection landscape provides rich resources for future multi-ancestry complex disease research. We also highlight the importance of expanding the size of non-European ancestry datasets and the potential of exploring ancestry-specific genetic analyses as these will be critical to improve our understanding of the genetic architecture of complex disease.

Download data

  • Downloaded 126 times
  • Download rankings, all-time:
    • Site-wide: 162,216
    • In genetic and genomic medicine: 1,081
  • Year to date:
    • Site-wide: 16,306
  • Since beginning of last month:
    • Site-wide: 20,111

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

News