Rxivist logo

Boiler: Lossy compression of RNA-seq alignments using coverage vectors

By Jacob Pritt, Ben Langmead

Posted 22 Feb 2016
bioRxiv DOI: 10.1101/040634 (published DOI: 10.1093/nar/gkw540)

We describe Boiler, a new software tool for compressing and querying large collections of RNA-seq alignments. Boiler discards most per-read data, keeping only a genomic coverage vector plus a few empirical distributions summarizing the alignments. Since most per-read data is discarded, storage footprint is often much smaller than that achieved by other compression tools. Despite this, the most relevant per-read data can be recovered; we show that Boiler compression has only a slight negative impact on results given by downstream tools for isoform assembly and quantification. Boiler also allows the user to pose fast and useful queries without decompressing the entire file. Boiler is free open source software available from https://github.com/jpritt/boiler.

Download data

  • Downloaded 761 times
  • Download rankings, all-time:
    • Site-wide: 17,418 out of 88,857
    • In bioinformatics: 2,629 out of 8,400
  • Year to date:
    • Site-wide: 73,804 out of 88,857
  • Since beginning of last month:
    • Site-wide: 64,896 out of 88,857

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)