Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 70,853 bioRxiv papers from 309,208 authors.

Hardy Weinberg Exact Test In Large Scale Variant Calling Quality Control

By Zhuoyi Huang, Navin Rustagi, Degui Zhi, L. Adrienne Cupples, Richard Gibbs, Eric Boerwinkle, Fuli Yu

Posted 19 Dec 2016
bioRxiv DOI: 10.1101/095521

Hardy Weinberg Equilibrium (HWE) test is widely used as a quality control measure to detect sequencing artifacts like mismapping, allelic dropout and biases. However, in the high throughput sequencing era, where the sample size is beyond a thousand scale, the utility of HWE test in reducing the false positive rate remains unclear. In this paper, we demonstrate that HWE test has limited power in identifying sequencing artifacts when the variant allele frequency is lower than 1% in a variant call set produced from more than five thousand whole genome sequenced samples from two homogeneous populations. We develop a novel strategy of implementing HWE filtering in which we incorporate site frequency spectrum information and determine the p-value cutoff which optimizes the tradeoff between sensitivity and specificity. The novel strategy is shown to outperform the exact test of HWE with an empirical constant p-value cutoff regardless of the sequencing sample size. We also present best practice recommendations for identifying possible sources of false positives from large sequencing datasets based on an analysis of intrinsic biases in the variant calling process. Our novel strategy of determining the HWE test p-value cutoff and applying the test to the common variants provides a practical approach for the variant level quality controls in the upcoming sequencing projects with tens to hundreds of thousand of samples.

Download data

  • Downloaded 728 times
  • Download rankings, all-time:
    • Site-wide: 13,547 out of 70,857
    • In bioinformatics: 2,189 out of 6,936
  • Year to date:
    • Site-wide: 11,846 out of 70,857
  • Since beginning of last month:
    • Site-wide: 37,582 out of 70,857

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News