Rxivist logo

Prediction of RNA-protein sequence and structure binding preferences using deep convolutional and recurrent neural networks

By Xiaoyong Pan, Peter Rijnbeek, Junchi Yan, Hong-Bin Shen

Posted 05 Jun 2017
bioRxiv DOI: 10.1101/146175 (published DOI: 10.1186/s12864-018-4889-1)

RNA regulation is significantly dependent on its binding protein partner, which is known as the RNA-binding proteins (RBPs). Unfortunately, the binding preferences for most RBPs are still not well characterized, especially on the structure point of view. Informative signals hiding and interdependencies between sequence and structure specificities are two challenging problems for both predicting RBP binding sites and accurate sequence and structure motifs mining. In this study, we propose a deep learning-based method, iDeepS, to simultaneously identify the binding sequence and structure motifs from RNA sequences using convolutional neural networks (CNNs) and a bidirectional long short term memory network (BLSTM). We first perform one-hot encoding for both the sequence and predicted secondary structure, which are appropriate for subsequent convolution operations. To reveal the hidden binding knowledge from the observations, the CNNs are applied to learn the abstract motif features. Considering the close relationship between sequences and predicted structures, we use the BLSTM to capture the long range dependencies between binding sequence and structure motifs identified by the CNNs. Finally, the learned weighted representations are fed into a classification layer to predict the RBP binding sites. We evaluated iDeepS on verified RBP binding sites derived from large-scale representative CLIP-seq datasets, and the results demonstrate that iDeepS can reliably predict the RBP binding sites on RNAs, and outperforms the state-of-the-art methods. An important advantage is that iDeepS is able to automatically extract both binding sequence and structure motifs, which will improve our transparent understanding of the mechanisms of binding specificities of RBPs. iDeepS is available at https://github.com/xypan1232/iDeepS.

Download data

  • Downloaded 2,162 times
  • Download rankings, all-time:
    • Site-wide: 7,685
    • In bioinformatics: 879
  • Year to date:
    • Site-wide: 82,816
  • Since beginning of last month:
    • Site-wide: 105,061

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide