Rxivist logo

Sequencing and Imputation in GWAS: Cost-Effective Strategies to Increase Power and Genomic Coverage Across Diverse Populations

By Corbin Quick, Pramod Anugu, Solomon Musani, Scott T. Weiss, Esteban G. Burchard, Marquitta J White, Kevin L. Keys, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, Francesco Cucca, Carlo Sidore, Michael Boehnke, Christian Fuchsberger

Posted 13 Feb 2019
bioRxiv DOI: 10.1101/548321 (published DOI: 10.1002/gepi.22326)

A key aim for current genome-wide association studies (GWAS) is to interrogate the full spectrum of genetic variation underlying human traits, including rare variants, across populations. Deep whole-genome sequencing is the gold standard to capture the full spectrum of genetic variation, but remains prohibitively expensive for large samples. Array genotyping interrogates a sparser set of variants, which can be used as a scaffold for genotype imputation to capture variation across a wider set of variants. However, imputation coverage and accuracy depend crucially on the reference panel size and genetic distance from the target population. Here, we consider a strategy in which a subset of study participants is sequenced and the rest array-genotyped and imputed using a reference panel that comprises the sequenced study participants and individuals from an external reference panel. We systematically assess how imputation quality and statistical power for association depend on the number of individuals sequenced and included in the reference panel for two admixed populations (African and Latino Americans) and two European population isolates (Sardinians and Finns). We develop a framework to identify powerful and cost-effective GWAS designs in these populations given current sequencing and array genotyping costs. For populations that are well-represented in current reference panels, we find that array genotyping alone is cost-effective and well-powered to detect both common- and rare-variant associations. For poorly represented populations, we find that sequencing a subset of study participants to improve imputation is often more cost-effective than array genotyping alone, and can substantially increase genomic coverage and power.

Download data

  • Downloaded 911 times
  • Download rankings, all-time:
    • Site-wide: 39,094
    • In genetics: 1,526
  • Year to date:
    • Site-wide: 186,116
  • Since beginning of last month:
    • Site-wide: 188,235

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide