Rxivist logo

Leveraging ancestry to improve causal variant identification in exome sequencing for monogenic disorders

By Robert Brown, Hane Lee, Ascia Eskin, Gleb Kichaev, Kirk E Lohmueller, Bruno Reversade, Stanley F. Nelson, Bogdan Pasaniuc

Posted 04 Oct 2014
bioRxiv DOI: 10.1101/010017 (published DOI: 10.1038/ejhg.2015.68)

Recent breakthroughs in exome sequencing technology have made possible the identification of many causal variants of monogenic disorders. Although extremely powerful when closely related individuals (e.g. child and parents) are simultaneously sequenced, exome sequencing of individual only cases is often unsuccessful due to the large number of variants that need to be followed-up for functional validation. Many approaches remove from consideration common variants above a given frequency threshold (e.g. 1%), and then prioritize the remaining variants according to their allele frequency, functional, structural and conservation properties. In this work, we present methods that leverage the genetic structure of different populations while accounting for the finite sample size of the reference panels to improve the variant filtering step. Using simulations and real exome data from individuals with monogenic disorders, we show that our methods significantly reduce the number of variants to be followed-up (e.g. a 36% reduction from an average 418 variants per exome when ancestry is ignored to 267 when ancestry is taken into account for case-only sequenced individuals). Most importantly our proposed approaches are well calibrated with respect to the probability of filtering out a true causal variant (i.e. false negative rate, FNR), whereas existing approaches are susceptible to high FNR when reference panel sizes are limited.

Download data

  • Downloaded 596 times
  • Download rankings, all-time:
    • Site-wide: 81,104
    • In genetics: 3,050
  • Year to date:
    • Site-wide: 200,776
  • Since beginning of last month:
    • Site-wide: 176,956

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide