Multi-ancestry gene-trait connection landscape using electronic health record (EHR) linked biobank data
Shefali Setia Verma,
Hae Kyung Im,
Casey S Greene,
Marylyn D. Ritchie
Posted 26 Oct 2021
medRxiv DOI: 10.1101/2021.10.21.21265225
Posted 26 Oct 2021
Understanding genetic factors of complex traits across ancestry groups holds a key to improve the overall health care quality for diverse populations in the United States. In recent years, multiple electronic health record-linked (EHR-linked) biobanks have recruited participants of diverse ancestry backgrounds; these biobanks make it possible to obtain phenome-wide association study (PheWAS) summary statistics on a genome-wide scale for different ancestry groups. Moreover, advancement in bioinformatics methods provide novel means to accelerate the translation of basic discoveries to clinical utility by integrating GWAS summary statistics and expression quantitative trait locus (eQTL) data to identify complex trait-related genes, such as transcriptome-wide association study (TWAS) and colocalization analyses. Here, we combined the advantages of multi-ancestry biobanks and data integrative approaches to investigate the multi-ancestry, gene-disease connection landscape. We first performed a phenome-wide TWAS on Electronic Medical Records and Genomics (eMERGE) III network participants of European ancestry (N = 68,813) and participants of African ancestry (N = 12,658) populations, separately. For each ancestry group, the phenome-wide TWAS tested gene-disease associations between 22,535 genes and 309 curated disease phenotypes in 49 primary human tissues, as well as cross-tissue associations. Next, we identified gene-disease associations that were shared across the two ancestry groups by combining the ancestry-specific results via meta-analyses. We further applied a Bayesian colocalization method, fastENLOC, to prioritize likely functional gene-disease associations with supportive colocalized eQTL and GWAS signals. We replicated the phenome-wide gene-disease analysis in the analogous Penn Medicine BioBank (PMBB) cohorts and sought additional validations in the PhenomeXcan UK Biobank (UKBB) database, PheWAS catalog, and systematic literature review. Phenome-wide TWAS identified many proof-of-concept gene-disease associations, e.g. FTO-obesity association (p = 7.29e-15), and numerous novel disease-associated genes, e.g. association between GATA6-AS1 with pulmonary heart disease (p = 4.60e-10). In short, the multi-ancestry, gene-disease connection landscape provides rich resources for future multi-ancestry complex disease research. We also highlight the importance of expanding the size of non-European ancestry datasets and the potential of exploring ancestry-specific genetic analyses as these will be critical to improve our understanding of the genetic architecture of complex disease.
- Downloaded 126 times
- Download rankings, all-time:
- Site-wide: 162,216
- In genetic and genomic medicine: 1,081
- Year to date:
- Site-wide: 16,306
- Since beginning of last month:
- Site-wide: 20,111
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!