Rxivist logo

A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog

By Joannella Morales, Emily H. Bowler, Annalisa Buniello, Maria Cerezo, Peggy Hall, Laura W. Harris, Emma Hastings, Heather A. Junkins, Cinzia Malangone, Aoife C. McMahon, Annalisa Milano, Danielle Welter, Tony Burdett, Fiona Cunningham, Paul Flicek, Helen Parkinson, Lucia A Hindorff, Jacqueline A. L. MacArthur

Posted 21 Apr 2017
bioRxiv DOI: 10.1101/129395 (published DOI: 10.1186/s13059-018-1396-2)

Background: The accurate description of ancestry is essential to interpret and integrate human genomics data, and to ensure that advances in the field of genomics benefit individuals from all ancestral backgrounds. However, there are no established guidelines for the consistent, unambiguous and standardized description of ancestry. To fill this gap, we provide a framework, designed for the representation of ancestry in GWAS data, but with wider application to studies and resources involving human subjects. Results: Here we describe our framework and its application to the representation of ancestry data in a widely-used publically available genomics resource, the NHGRI-EBI GWAS Catalog. We present the first analyses of GWAS data using our ancestry categories, demonstrating the validity of the framework to facilitate the tracking of ancestry in big data sets. We exhibit the broader relevance and integration potential of our method by its usage to describe the well-established HapMap and 1000 Genomes reference populations. Finally, to encourage adoption, we outline recommendations for authors to implement when describing samples. Conclusions: While the known bias towards inclusion of European ancestry individuals in GWA studies persists, African and Hispanic or Latin American ancestry populations contribute a disproportionately high number of associations, suggesting that analyses including these groups may be more effective at identifying new associations. We believe the widespread adoption of our framework will increase standardization of ancestry data, thus enabling improved analysis, interpretation and integration of human genomics data and furthering our understanding of disease.

Download data

  • Downloaded 664 times
  • Download rankings, all-time:
    • Site-wide: 22,968 out of 94,912
    • In genetics: 1,437 out of 4,824
  • Year to date:
    • Site-wide: 54,799 out of 94,912
  • Since beginning of last month:
    • Site-wide: 48,157 out of 94,912

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)