Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 65,016 bioRxiv papers from 288,163 authors.

Sequencing and Imputation in GWAS: Cost-Effective Strategies to Increase Power and Genomic Coverage Across Diverse Populations

By Corbin Quick, Pramod Anugu, Solomon Musani, Scott T Weiss, Esteban G Burchard, Marquitta J White, Kevin L. Keys, NHLBI Trans-Omics for Precision Medicine (TOPMed), Francesco Cucca, Carlo Sidore, Michael Boehnke, Christian Fuchsberger

Posted 13 Feb 2019
bioRxiv DOI: 10.1101/548321

A key aim for current genome-wide association studies (GWAS) is to interrogate the full spectrum of genetic variation underlying human traits, including rare variants, across populations. Deep whole-genome sequencing is the gold standard to capture the full spectrum of genetic variation, but remains prohibitively expensive for large samples. Array genotyping interrogates a sparser set of variants, which can be used as a scaffold for genotype imputation to capture variation across a wider set of variants. However, imputation coverage and accuracy depend crucially on the reference panel size and genetic distance from the target population. Here, we consider a strategy in which a subset of study participants is sequenced and the rest array-genotyped and imputed using a reference panel that comprises the sequenced study participants and individuals from an external reference panel. We systematically assess how imputation quality and statistical power for association depend on the number of individuals sequenced and included in the reference panel for two admixed populations (African and Latino Americans) and two European population isolates (Sardinians and Finns). We develop a framework to identify powerful and cost-effective GWAS designs in these populations given current sequencing and array genotyping costs. For populations that are well-represented in current reference panels, we find that array genotyping alone is cost-effective and well-powered to detect both common- and rare-variant associations. For poorly represented populations, we find that sequencing a subset of study participants to improve imputation is often more cost-effective than array genotyping alone, and can substantially increase genomic coverage and power.

Download data

  • Downloaded 576 times
  • Download rankings, all-time:
    • Site-wide: 16,820 out of 65,016
    • In genetics: 1,142 out of 3,688
  • Year to date:
    • Site-wide: 4,498 out of 65,016
  • Since beginning of last month:
    • Site-wide: 11,264 out of 65,016

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News