Rxivist logo

SeqOthello: Query over RNA-seq experiments at scale

By Ye Yu, Jinpeng Liu, Xinan Liu, Yi Zhang, Eamonn Magner, Chen Qian, Jinze Liu

Posted 01 Feb 2018
bioRxiv DOI: 10.1101/258772

We present SeqOthello, an ultra-fast and memory-efficient indexing structure to support arbitrary sequence query against large collections of RNA-seq experiments. SeqOthello requires only five minutes to conduct a global survey of 11,658 fusion events against 10,113 TCGA Pan-Cancer RNA-seq datasets on a standard computer with 19.1 GB memory space. The query recovers 92.7% of tier-1 fusions curated by TCGA Fusion Gene Database and further reveals 270 novel fusion occurrences, all of which present as tumor-specific. The entire index is only 76 GB, achieving a 700:1 compression ratio relative to the original sequencing data and making it extremely portable. This is the first sequence search index constructed on the scale of TCGA data. By providing a reference-free, alignment-free, and parameter-free sequence search system, SeqOthello will enable large-scale integrative studies using sequence-level data, an undertaking not previously practicable for many individual labs. SeqOthello is currently available at https://github.com/LiuBioinfo/SeqOthello.

Download data

  • Downloaded 778 times
  • Download rankings, all-time:
    • Site-wide: 56,150
    • In bioinformatics: 5,321
  • Year to date:
    • Site-wide: 191,788
  • Since beginning of last month:
    • Site-wide: 160,432

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide