Rxivist logo

Improved polygenic prediction by Bayesian multiple regression on summary statistics

By Luke R. Lloyd-Jones, Jian Zeng, Julia Sidorenko, Loïc Yengo, Gerhard Moser, Kathryn E Kemper, Huanwei Wang, Zhili Zheng, Reedik Magi, Tonu Esko, Andres Metspalu, Naomi R Wray, Michael E Goddard, Jian Yang, Peter M Visscher

Posted 17 Jan 2019
bioRxiv DOI: 10.1101/522961 (published DOI: 10.1038/s41467-019-12653-0)

The capacity to accurately predict an individual's phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. Recently, Bayesian methods for generating polygenic predictors have been successfully applied in human genomics but require the individual level data, which are often limited in their access due to privacy or logistical concerns, and are computationally very intensive. This has motivated methodological frameworks that utilise publicly available genome-wide association studies (GWAS) summary data, which now for some traits include results from greater than a million individuals. In this study, we extend the established summary statistics methodological framework to include a class of point-normal mixture prior Bayesian regression models, which have been shown to generate optimal genetic predictions and can perform heritability estimation, variant mapping and estimate the distribution of the genetic effects. In a wide range of simulations and cross-validation using 10 real quantitative traits and 1.1 million variants on 350,000 individuals from the UK Biobank (UKB), we establish that our summary based method, SBayesR, performs similarly to methods that use the individual level data and outperforms other state-of-the-art summary statistics methods in terms of prediction accuracy and heritability estimation at a fraction of the computational resources. We generate polygenic predictors for body mass index and height in two independent data sets and show that by exploiting summary statistics on 1.1 million variants from the largest GWAS meta-analysis (n ≈ 700, 000) that the SBayesR prediction R2 improved on average across traits by 6.8% relative to that estimated from an individual-level data BayesR analysis of data from the UKB (n ≈ 450, 000). Compared with commonly used state-of-the-art summary- based methods, SBayesR improved the prediction R2 by 4.1% relative to LDpred and by 28.7% relative to clumping and p-value thresholding. SBayesR gave comparable prediction accuracy to the recent RSS method, which has a similar model, but at a computational time that is two orders of magnitude smaller. The methodology is implemented in a very efficient and user-friendly software tool titled GCTB.

Download data

  • Downloaded 2,603 times
  • Download rankings, all-time:
    • Site-wide: 2,516 out of 94,912
    • In genetics: 191 out of 4,824
  • Year to date:
    • Site-wide: 16,184 out of 94,912
  • Since beginning of last month:
    • Site-wide: 22,966 out of 94,912

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)