ssHMM: Extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data
RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. To which extent RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or produce models which are not directly interpretable as sequence-structure motifs. Thus, a tool which produces informative motifs and at the same time captures the relationship between RNA primary sequence and secondary structure is missing. We developed ssHMM, an RNA motif finder that combines a hidden Markov model (HMM) with Gibbs sampling to learn the joint sequence and structure binding preferences of RBPs from high-throughput data, such as CLIP-Seq sequences, and intuitively visualizes them as a graph. Evaluations on synthetic data showed that ssHMM reliably recovers fuzzy sequence motifs in 80 to 100% of the cases, outperforming state-of-the-art methods designed for a similar task. On real data, it produces motifs with higher information content than existing tools. Additionally, ssHMM is considerably faster than other methods on large data sets. We also discuss examples of novel sequence-structure motifs for uncharacterized RBPs which could be identified by ssHMM. ssHMM is freely available on Github.
- Downloaded 634 times
- Download rankings, all-time:
- Site-wide: 22,714 out of 88,613
- In bioinformatics: 3,221 out of 8,383
- Year to date:
- Site-wide: 70,336 out of 88,613
- Since beginning of last month:
- Site-wide: 22,120 out of 88,613
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!