Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 67,591 bioRxiv papers from 298,050 authors.

Motto: Representing motifs in consensus sequences with minimum information loss

By Mengchi Wang, David Wang, Kai Zhang, Vu Ngo, Shicai Fan, Wei Wang

Posted 13 Apr 2019
bioRxiv DOI: 10.1101/607408

Sequence analysis frequently requires intuitive understanding and convenient representation of motifs. Typically, motifs are represented as position weight matrices (PWMs) and visualized using sequence logos. However, in many scenarios, representing motifs by wildcard-style consensus sequences is compact and sufficient for interpreting the motif information and search for motif match. Based on mutual information theory and Jenson-Shannon Divergence, we propose a mathematical framework to minimize the information loss in converting PWMs to consensus sequences. We name this representation as sequence Motto and have implemented an efficient algorithm with flexible options for converting motif PWMs into Motto from nucleotides, amino acids, and customized alphabets. Here we show that this representation provides a simple and efficient way to identify the binding sites of 1156 common TFs in the human genome. The effectiveness of the method was benchmarked by comparing sequence matches found by Motto with PWM scanning results found by FIMO. On average, our method achieves 0.81 area under the precision-recall curve, significantly (p-value < 0.01) outperforming all existing methods, including maximal positional weight, Douglas and minimal mean square error. We believe this representation provides a distilled summary of a motif, as well as the statistical justification.

Download data

  • Downloaded 291 times
  • Download rankings, all-time:
    • Site-wide: 37,571 out of 67,591
    • In bioinformatics: 4,581 out of 6,655
  • Year to date:
    • Site-wide: 15,710 out of 67,591
  • Since beginning of last month:
    • Site-wide: 18,937 out of 67,591

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide

Sign up for the Rxivist weekly newsletter! (Click here for more details.)