Rxivist logo

LinearFold: Linear-Time Prediction of RNA Secondary Structures

By Dezhong Deng, Kai Zhao, David Hendrix, David H Mathews, Liang Huang

Posted 12 Feb 2018
bioRxiv DOI: 10.1101/263509

Predicting the secondary structure of an RNA sequence with speed and accuracy is useful in many applications such as drug design. The state-of-the-art predictors have a fundamental limitation: they have a run time that scales with the third power of the length of the input sequence, which is slow for longer RNAs and limits the use of secondary structure prediction in genome-wide applications. To address this bottleneck, we designed the first linear-time algorithm for RNA secondary structure prediction, which can be used with both thermodynamic and machine-learned scoring functions. Our algorithm, like previous work, is based on dynamic programming (DP), but with two crucial differences: (a) we incrementally process the sequence in a left-to-right rather than in a bottom-up fashion, and (b) because of this processing, we can further employ beam search pruning to ensure linear runtime in practice (with the cost of exact search). Even though our search is approximate, surprisingly, it results in even higher overall accuracy on a diverse database of se- quences with known structures. In particular, it leads to significantly more accurate predictions on the longest sequence families in that database (16S and 23S Ribosomal RNAs), as well as improved accu- racies for long-range base pairs (500+ nucleotides apart).

Download data

  • Downloaded 1,678 times
  • Download rankings, all-time:
    • Site-wide: 4,887 out of 89,518
    • In bioinformatics: 900 out of 8,443
  • Year to date:
    • Site-wide: 6,437 out of 89,518
  • Since beginning of last month:
    • Site-wide: 27,076 out of 89,518

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News