Rxivist logo

High-depth whole genome sequencing of a large population-specific reference panel: Enhancing sensitivity, accuracy, and imputation

By Todd Lencz, Jin Yu, Cameron Palmer, Shai Carmi, Danny Ben-Avraham, Nir Barzilai, Susan Bressman, Ariel Darvasi, Judy Cho, Lorraine N. Clark, Zeynep H Gumus, Vijai Joseph, Robert Klein, Steven Lipkin, Kenneth Offit, Harry Ostrer, Laurie J. Ozelius, Inga Peter, Gil Atzmon, Itsik Pe'er

Posted 24 Jul 2017
bioRxiv DOI: 10.1101/167924 (published DOI: 10.1007/s00439-018-1886-z)

Background: While increasingly large reference panels for genome-wide imputation have been recently made available, the degree to which imputation accuracy can be enhanced by population-specific reference panels remains an open question. In the present study, we sequenced at full-depth (≥30x) a moderately large (n=738) cohort of samples drawn from the Ashkenazi Jewish population across two platforms (Illumina X Ten and Complete Genomics, Inc.). We developed and refined a series of quality control steps to optimize sensitivity, specificity, and comprehensiveness of variant calls in the reference panel, and then tested the accuracy of imputation against target cohorts drawn from the same population. Results: For samples sequenced on the Illumina X Ten platform, quality thresholds were identified that permitted highly accurate calling of single nucleotide variants across 94% of the genome. The Complete Genomics, Inc. platform was more conservative (fewer variants called) compared to the Illumina platform, but also demonstrated relatively greater numbers of false positives that needed to be filtered. Quality control procedures also permitted detection of novel genome reads that are not mapped to current reference or alternate assemblies. After stringent quality control, the population-specific reference panel produced more accurate and comprehensive imputation results relative to publicly available, large cosmopolitan reference panels. The population-specific reference panel also permitted enhanced filtering of clinically irrelevant variants from personal genomes. Conclusions: Our primary results demonstrate enhanced accuracy of a population-specific imputation panel relative to cosmopolitan panels, especially in the range of infrequent (<5% non-reference allele frequency) and rare (<1% non-reference allele frequency) variants that may be most critical to further progress in mapping of complex phenotypes.

Download data

  • Downloaded 635 times
  • Download rankings, all-time:
    • Site-wide: 56,955
    • In genomics: 4,141
  • Year to date:
    • Site-wide: 103,846
  • Since beginning of last month:
    • Site-wide: 161,731

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide