Rxivist logo

CellBender remove-background: a deep generative model for unsupervised removal of background noise from scRNA-seq datasets

By Stephen J. Fleming, John Marioni, Mehrtash Babadi

Posted 03 Oct 2019
bioRxiv DOI: 10.1101/791699

Droplet-based scRNA-seq assays are known to produce a significant amount of background RNA counts, the hallmark of which is non-zero transcript counts in presumably empty droplets. The presence of background RNA can lead to systematic biases and batch effects in various downstream analyses such as differential expression and marker gene discovery. In this paper, we explore the phenomenology and mechanisms of background RNA generation in droplet-based scRNA-seq assays and present a deep generative model of background-contaminated counts mirroring those mechanisms. The model is used for learning the background RNA profile, distinguishing cell-containing droplets from empty ones, and retrieving background-free gene expression profiles. We implement the model along with a fast and scalable inference algorithm as the remove-background module in CellBender, an open-source scRNA-seq data processing software package. Finally, we present simulations and investigations of several scRNA-seq datasets to show that processing raw data using CellBender significantly boosts the magnitude and specificity of differential expression across different cell types.

Download data

  • Downloaded 4,075 times
  • Download rankings, all-time:
    • Site-wide: 3,402
    • In bioinformatics: 292
  • Year to date:
    • Site-wide: 2,264
  • Since beginning of last month:
    • Site-wide: 2,862

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide