Design and quality control of large-scale two-sample Mendelian randomisation studies
Fatty Acids in Cancer Mendelian Randomization Collaboration,
Philip C Haycock,
Maria Carolina Borges,
Rozenn N. Lemaitre,
Nikhil K. Khankari,
Konstantinos K. Tsilidis,
Amanda B Spurdle,
Matthew H Law,
Fatemeh Saberi Hosnijeh,
Rayjean J Hung,
Marc J Gunter,
George Davey Smith,
Richard M Martin
Posted 01 Aug 2021
medRxiv DOI: 10.1101/2021.07.30.21260578
Posted 01 Aug 2021
Background: Mendelian randomization studies are susceptible to meta-data errors (e.g. incorrect specification of the effect allele column) and other analytical issues that can introduce substantial bias into analyses. We developed a quality control pipeline for the Fatty Acids in Cancer Mendelian Randomization Collaboration (FAMRC) that can be used to identify and correct for such errors. Methods: We invited cancer GWAS to share summary association statistics with the FAMRC and subjected the collated data to a comprehensive QC pipeline. We identified meta data errors through comparison of study-specific statistics to external reference datasets (the NHGRI-EBI GWAS catalog and 1000 genome super populations) and other analytical issues through comparison of reported to expected genetic effect sizes. Comparisons were based on three sets of genetic variants: 1) GWAS hits for fatty acids, 2) GWAS hits for cancer and 3) a 1000 genomes reference set. Results: We collated summary data from six fatty acid and 49 cancer GWAS. Meta data errors and analytical issues with the potential to introduce substantial bias were identified in seven studies (13%). After resolving analytical issues and excluding unreliable data, we created a dataset of 219,842 genetic associations with 87 cancer types. Conclusion: In this large MR collaboration, 13% of included studies were affected by a substantial meta data error or other analytical issue. By increasing the integrity of collated summary data prior to their analysis, our protocol can be used to increase the reliability of post-GWAS analyses. Our pipeline is available to other researchers via the CheckSumStats package (https://github.com/MRCIEU/CheckSumStats).
- Downloaded 101 times
- Download rankings, all-time:
- Site-wide: 152,855
- In epidemiology: 6,304
- Year to date:
- Site-wide: 85,594
- Since beginning of last month:
- Site-wide: 50,368
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!