Rxivist logo

Population Stratification at the Phenotypic Variance level and Implication for the Analysis of Whole Genome Sequencing Data from Multiple Studies

By Tamar Sofer, Xiuwen Zheng, Cecelia A Laurie, Stephanie M Gogarten, Jennifer A Brody, Matthew P. Conomos, Joshua C Bis, Timothy A. Thornton, Adam Szpiro, Jeffrey R. O’Connell, Ethan M Lange, Yan Gao, L. Adrienne Cupples, Bruce M. Psaty, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, Kenneth M Rice

Posted 05 Mar 2020
bioRxiv DOI: 10.1101/2020.03.03.973420

In modern Whole Genome Sequencing (WGS) epidemiological studies, participant-level data from multiple studies are often pooled and results are obtained from a single analysis. We consider the impact of differential phenotype variances by study, which we term 'variance stratification'. Unaccounted for, variance stratification can lead to both decreased statistical power, and increased false positives rates, depending on how allele frequencies, sample sizes, and phenotypic variances vary across the studies that are pooled. We describe a WGS-appropriate analysis approach, implemented in freely-available software, which allows study-specific variances and thereby improves performance in practice. We also illustrate the variance stratification problem, its solutions, and a corresponding diagnostic procedure in data from the Trans-Omics for Precision Medicine Whole Genome Sequencing Program (TOPMed), used in association tests for hemoglobin concentrations and BMI.

Download data

  • Downloaded 168 times
  • Download rankings, all-time:
    • Site-wide: 84,422 out of 101,349
    • In genetics: 4,408 out of 5,037
  • Year to date:
    • Site-wide: 36,774 out of 101,349
  • Since beginning of last month:
    • Site-wide: 46,782 out of 101,349

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)


  • 20 Oct 2020: Support for sorting preprints using Twitter activity has been removed, at least temporarily, until a new source of social media activity data becomes available.
  • 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
  • 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
  • 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
  • 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
  • 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
  • 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
  • 22 Jan 2019: Nature just published an article about Rxivist and our data.
  • 13 Jan 2019: The Rxivist preprint is live!