Rxivist logo

STREME: Accurate and versatile sequence motif discovery

By Timothy L Bailey

Posted 23 Nov 2020
bioRxiv DOI: 10.1101/2020.11.23.394619

Sequence motif discovery algorithms can identify novel sequence patterns that perform biological functions in DNA, RNA and protein sequences--for example, the binding site motifs of DNA- and RNA-binding proteins. The STREME algorithm presented here advances the state-of-the-art in ab initio motif discovery in terms of both accuracy and versatility. Using in vivo DNA (ChIP-seq) and RNA (CLIP-seq) data, and validating motifs with reference motifs derived from in vitro data, we show that STREME is more accurate, sensitive, thorough and rapid than several widely used algorithms (DREME, HOMER, MEME, Peak-motifs and Weeder). STREME's capabilities include the ability to find motifs in datasets with hundreds of thousands of sequences, to find both short and long motifs (from 3 to 30 positions), to perform differential motif discovery in pairs of sequence datasets, and to find motifs in sequences over virtually any alphabet (DNA, RNA, protein and user-defined alphabets). Unlike most motif discovery algorithms, STREME accurately estimates and reports the statistical significance of each motif that it discovers. STREME is easy to use via its web server at http://meme-suite.org, and is fully integrated with the widely-used MEME Suite of sequence analysis tools, which can be freely downloaded at the same web site for non-commercial use.

Download data

  • Downloaded 1,242 times
  • Download rankings, all-time:
    • Site-wide: 20,412
    • In bioinformatics: 2,307
  • Year to date:
    • Site-wide: 5,272
  • Since beginning of last month:
    • Site-wide: 15,405

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide