This study investigates the creation of polygenic scores (PGS)s for human population research. PGSs are a linear, usually weighted, combination of risk alleles that estimate the cumulative genetic risk of an individual for a particular trait. While conceptually simple, there are numerous ways to estimate PGSs, not all achieving the same end goals. In this paper, we systematically investigate the impact of four key decisions in the building of PGSs from published genome-wide association meta-analysis results: 1) whether to use single nucleotide polymorphisms (SNPs) assessed by imputation, 2) criteria for selecting which SNPs to include in the score, 3) whether to account for linkage disequilibrium (LD), and 4) if accounting for LD, which type of method best captures the correlation structure among SNPs (i.e. clumping vs. pruning). Using the Health and Retirement Study (HRS), a nationally representative, population-based longitudinal panel study of Americans over the age of 50, we examine the predictive ability as well as the variability and co-variability in PGSs arising from these different estimation approaches. We examine four traits with large published and replicated genome-wide association studies (height, body mass index, educational attainment, and depression). Our central finding demonstrates PGSs that include all available SNPs either explain the most amount of variation in an outcome or are not significantly different than the PGSs that does. Thus, for reproducibility through rigor and transparency, we recommend that researchers include a PGS with all available SNPs as a reference, and provide substantial justification for using alternative methods.
- Downloaded 3,622 times
- Download rankings, all-time:
- Site-wide: 2,959
- In genetics: 118
- Year to date:
- Site-wide: 15,739
- Since beginning of last month:
- Site-wide: 10,971
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!