A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog
Emily H. Bowler,
Laura W. Harris,
Heather A. Junkins,
Aoife C. McMahon,
Lucia A Hindorff,
Jacqueline A. L. MacArthur
Posted 21 Apr 2017
bioRxiv DOI: 10.1101/129395 (published DOI: 10.1186/s13059-018-1396-2)
Posted 21 Apr 2017
Background: The accurate description of ancestry is essential to interpret and integrate human genomics data, and to ensure that advances in the field of genomics benefit individuals from all ancestral backgrounds. However, there are no established guidelines for the consistent, unambiguous and standardized description of ancestry. To fill this gap, we provide a framework, designed for the representation of ancestry in GWAS data, but with wider application to studies and resources involving human subjects. Results: Here we describe our framework and its application to the representation of ancestry data in a widely-used publically available genomics resource, the NHGRI-EBI GWAS Catalog. We present the first analyses of GWAS data using our ancestry categories, demonstrating the validity of the framework to facilitate the tracking of ancestry in big data sets. We exhibit the broader relevance and integration potential of our method by its usage to describe the well-established HapMap and 1000 Genomes reference populations. Finally, to encourage adoption, we outline recommendations for authors to implement when describing samples. Conclusions: While the known bias towards inclusion of European ancestry individuals in GWA studies persists, African and Hispanic or Latin American ancestry populations contribute a disproportionately high number of associations, suggesting that analyses including these groups may be more effective at identifying new associations. We believe the widespread adoption of our framework will increase standardization of ancestry data, thus enabling improved analysis, interpretation and integration of human genomics data and furthering our understanding of disease.
- Downloaded 664 times
- Download rankings, all-time:
- Site-wide: 22,968 out of 94,912
- In genetics: 1,437 out of 4,824
- Year to date:
- Site-wide: 54,799 out of 94,912
- Since beginning of last month:
- Site-wide: 48,157 out of 94,912
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!