Rxivist logo

Allele-specific transcription factor binding as a benchmark for assessing variant impact predictors

By Omar Wagih, Daniele Merico, Andrew Delong, Brendan J Frey

Posted 01 Feb 2018
bioRxiv DOI: 10.1101/253427

Genetic variation has long been known to alter transcription factor binding sites, resulting in sometimes major phenotypic consequences. While the performance for current binding site predictors is well characterised, little is known on how these models perform at predicting impact of variants. We collected and curated over 132,000 potential allele-specific binding (ASB) ChIP-seq variants across 101 transcription factors (TFs). We then assessed the accuracy of TF binding models from five different methods on these high-confidence measurements, finding that deep learning methods were best performing yet still have room for improvement. Importantly, machine learning methods were consistently better than the venerable position weight matrix (PWM). Finally, predictions for certain TFs were consistently poor, and our investigation supports efforts to use features beyond sequence, such as methylation, DNA shape, and post-translational modifications. We submit that ASB data is an valuable benchmark for variant impact on TF binding.

Download data

  • Downloaded 1,524 times
  • Download rankings, all-time:
    • Site-wide: 12,764
    • In bioinformatics: 1,479
  • Year to date:
    • Site-wide: 69,838
  • Since beginning of last month:
    • Site-wide: 93,810

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide