Rxivist logo

Alevin efficiently estimates accurate gene abundances from dscRNA-seq data

By Avi Srivastava, Laraib Malik, Tom Smith, Ian Sudbery, Rob Patro

Posted 01 Jun 2018
bioRxiv DOI: 10.1101/335000 (published DOI: 10.1186/s13059-019-1670-y)

We introduce alevin, an efficient pipeline for gene quantification from dscRNA-seq (droplet-based single-cell RNA-seq) data. Alevin is an end-to-end quantification pipeline that starts from sample-demultiplexed FASTQ files and generates gene-level counts for two popular droplet-based sequencing protocols (drop-seq [1], and 10x-chromium [2]). Importantly, alevin handles all processing internally, avoiding reliance on external pipeline programs, and the need to write large intermediate files to disk. Alevin adopts efficient algorithms for cellular-barcode whitelist generation, cellular-barcode correction, lightweight per-cell UMI deduplication and quantification. This integrated solution allows alevin to process data much faster (typically ~ 10 times faster) than other approaches, while also working within a reasonable memory budget. This enables full, end-to-end analysis for single-cell human experiment consisting of ~ 4500 cells with 335 Million reads with 13G of RAM and 8 threads (of an Intel Xeon E5-2699 v4 CPU) in 27 minutes.

Download data

  • Downloaded 2,510 times
  • Download rankings, all-time:
    • Site-wide: 2,499 out of 89,266
    • In bioinformatics: 458 out of 8,426
  • Year to date:
    • Site-wide: 15,829 out of 89,266
  • Since beginning of last month:
    • Site-wide: 24,708 out of 89,266

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)