Rxivist logo

Efficient estimation and applications of cross-validated genetic predictions

By Joel Mefford, Danny Park, Zhili Zheng, Arthur Ko, Mika Ala-Korpela, Markku Laakso, Päivi Pajukanta, Jian Yang, John Witte, Noah Zaitlen

Posted 11 Jan 2019
bioRxiv DOI: 10.1101/517821

Large-scale cohorts with combined genetic and phenotypic data, coupled with methodological advances, have produced increasingly accurate genetic predictors of complex human phenotypes called polygenic risk scores (PRS). In addition to the potential translational impacts of identifying at-risk individuals, PRS are being utilized for a growing list of scientific applications including causal inference, identifying pleiotropy and genetic correlation, and powerful gene-based and mixed model association tests. Existing PRS approaches rely on external large-scale genetic cohorts that have also measured the phenotype of interest. They further require matching on ancestry and genotyping platform or imputation quality. In this work we present a novel reference-free method to produce PRS that does not rely on an external cohort. We show that naive implementations of reference-free PRS either result in substantial over-fitting or prohibitive increases in computational time. We show that our algorithm avoids both of these issues, and can produce informative in-sample PRS over any existing cohort without over-fitting. We then demonstrate several novel applications of reference-free PRS including detection of pleiotropy across 246 metabolic traits and efficient mixed-model association testing.

Download data

  • Downloaded 1,088 times
  • Download rankings, all-time:
    • Site-wide: 19,288
    • In genetics: 928
  • Year to date:
    • Site-wide: 35,248
  • Since beginning of last month:
    • Site-wide: 27,532

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide