Rxivist logo

Demographic history impacts stratification in polygenic scores

By Arslan A. Zaidi, Iain Mathieson

Posted 20 Jul 2020
bioRxiv DOI: 10.1101/2020.07.20.212530

Large genome-wide association studies (GWAS) have identified many loci exhibiting small but statistically significant associations with complex traits and disease risk. However, control of population stratification continues to be a limiting factor, particularly when calculating polygenic scores where subtle biases can cumulatively lead to large errors. We simulated GWAS under realistic models of demographic history to study the effect of residual stratification in large GWAS. We show that when population structure is recent, it cannot be fully corrected using principal components based on common variants\---|the standard approach\---|because common variants are uninformative about recent demographic history. Consequently, polygenic scores calculated from such GWAS results are biased in that they recapitulate non-genetic environmental structure. Principal components calculated from rare variants or identity-by-descent segments largely correct for this structure if environmental effects are smooth. However, even these corrections are not effective for local or batch effects. While sibling-based association tests are immune to stratification, the hybrid approach of ascertaining variants in a standard GWAS and then re-estimating effect sizes in siblings reduces but does not eliminate bias. Finally, we show that rare variant burden tests are relatively robust to stratification. Our results demonstrate that the effect of population stratification on GWAS and polygenic scores depends not only on the frequencies of tested variants and the distribution of environmental effects but also on the demographic history of the population. ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 1,313 times
  • Download rankings, all-time:
    • Site-wide: 15,923
    • In genetics: 742
  • Year to date:
    • Site-wide: 47,680
  • Since beginning of last month:
    • Site-wide: 62,606

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide