Rxivist logo

Virtual ChIP-seq: Predicting transcription factor binding by learning from the transcriptome

By Mehran Karimzadeh, Michael M. Hoffman

Posted 28 Feb 2018
bioRxiv DOI: 10.1101/168419

MOTIVATION: Identifying transcription factor binding sites is the first step in pinpointing non-coding mutations that disrupt the regulatory function of transcription factors and promote disease. ChIP-seq is the most common method for identifying binding sites, but performing it on patient samples is hampered by the amount of available biological material and the cost of the experiment. Existing methods for computational prediction of regulatory elements primarily predict binding in genomic regions with sequence similarity to known transcription factor sequence preferences. This has limited efficacy since most binding sites do not resemble known transcription factor sequence motifs, and many transcription factors are not even sequence-specific. RESULTS: We developed Virtual ChIP-seq, which predicts binding of individual transcription factors in new cell types using an artificial neural network that integrates ChIP-seq results from other cell types and chromatin accessibility data in the new cell type. Virtual ChIP-seq also uses learned associations between gene expression and transcription factor binding at specific genomic regions. This approach outperforms methods that predict TF binding solely based on sequence preference, predicting binding for 36 transcription factors (Matthews correlation coefficient > 0.3). AVAILABILITY: The datasets we used for training and validation are available at https://virchip.hoffmanlab.org. We have deposited in Zenodo the current version of our software (http://doi.org/10.5281/zenodo.1066928), datasets (http://doi.org/10.5281/zenodo.823297), predictions for 36 transcription factors on Roadmap Epigenomics cell types (http://doi.org/10.5281/zenodo.1455759), and predictions in Cistrome as well as ENCODE-DREAM in vivo TF Binding Site Prediction Challenge (http://doi.org/10.5281/zenodo.1209308).

Download data

  • Downloaded 6,256 times
  • Download rankings, all-time:
    • Site-wide: 1,963
    • In bioinformatics: 128
  • Year to date:
    • Site-wide: 10,102
  • Since beginning of last month:
    • Site-wide: 14,264

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide