Rxivist logo

SampleQC: robust multivariate, multi-celltype, multi-sample quality control for single cell data

By Will Macnair, Mark D Robinson

Posted 28 Aug 2021
bioRxiv DOI: 10.1101/2021.08.28.458012

Quality control (QC) is a critical component of single cell RNA-seq processing pipelines. Many single cell methods assume that scRNA-seq data comprises multiple celltypes that are distinct in terms of gene expression, however this is not reflected in current approaches to QC. We show that the current widely-used methods for QC may have a bias towards exclusion of rarer celltypes, especially those whose QC metrics are more extreme, e.g. those with naturally high mitochondrial proportions. We introduce SampleQC, which improves sensitivity and reduces bias relative to current industry standard approaches, via a robust Gaussian mixture model fit across multiple samples simultaneously. We show via simulations that SampleQC is less susceptible than other methods to exclusion of rarer celltypes. We also demonstrate SampleQC on complex real data, comprising up to 867k cells over 172 samples. The framework for SampleQC is general, and has applications as an outlier detection method for data beyond single cell RNA-seq. SampleQC is parallelized and implemented in Rcpp, and is available as an R package.

Download data

  • Downloaded 130 times
  • Download rankings, all-time:
    • Site-wide: 147,361
    • In bioinformatics: 11,246
  • Year to date:
    • Site-wide: 67,219
  • Since beginning of last month:
    • Site-wide: 4,787

Altmetric data

Distribution of downloads per paper, site-wide