The MRC IEU OpenGWAS data infrastructure
George Davey Smith,
Tom R Gaunt,
Posted 10 Aug 2020
bioRxiv DOI: 10.1101/2020.08.10.244293
Posted 10 Aug 2020
Data generated by genome-wide association studies (GWAS) are growing fast with the linkage of biobank samples to health records, and expanding capture of high-dimensional molecular phenotypes. However the utility of these efforts can only be fully realised if their complete results are collected from their heterogeneous sources and formats, harmonised and made programmatically accessible. Here we present the OpenGWAS database, an open source, open access, scalable and high-performance cloud-based data infrastructure that imports and publishes complete GWAS summary datasets and metadata for the scientific community. Our import pipeline harmonises these datasets against dbSNP and the human genome reference sequence, generates summary reports and standardises the format of results and metadata. Users can access the data via a website, an application programming interface, R and Python packages, and also as downloadable files that can be rapidly queried in high performance computing environments. OpenGWAS currently contains 126 billion genetic associations from 14,582 complete GWAS datasets representing a range of different human phenotypes and disease outcomes across different populations. We developed R and Python packages to serve as conduits between these GWAS data sources and a range of available analytical tools, enabling Mendelian randomization, genetic colocalisation analysis, fine mapping, genetic correlation and locus visualisation. OpenGWAS is freely accessible at https://gwas.mrcieu.ac.uk, and has been designed to facilitate integration with third party analytical tools. ### Competing Interest Statement TRG, GH and GDS have received research funding from GlaxoSmithKline and Biogen for projects that use the MRC IEU OpenGWAS database. VH has previously been supported by funding from GlaxoSmithKline. Neither company had any input into or control over the contents of this manuscript. Oracle have provided cloud resources to host the OpenGWAS database.
- Downloaded 946 times
- Download rankings, all-time:
- Site-wide: 28,896
- In genetics: 1,290
- Year to date:
- Site-wide: 9,693
- Since beginning of last month:
- Site-wide: 12,985
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!