Rxivist logo

Comprehensive generation, visualization, and reporting of quality control metrics for single-cell RNA sequencing data

By Rui Hong, Yusuke Koga, Shruthi Bandyadka, Anastasia Leshchyk, Zhe Wang, Salam Alabdullatif, Yichen Wang, Vidya Akavoor, Xinyun Cao, Irzam Sarfraz, Frederick Jansen, W. Evan Johnson, Masanao Yajima, Joshua D. Campbell

Posted 17 Nov 2020
bioRxiv DOI: 10.1101/2020.11.16.385328

Performing comprehensive quality control is necessary to remove technical or biological artifacts in single-cell RNA sequencing (scRNA-seq) data. Artifacts in the scRNA-seq data, such as doublets or ambient RNA, can also hinder downstream clustering and marker selection and need to be assessed. While several algorithms have been developed to perform various quality control tasks, they are only available in different packages across various programming environments. No standardized workflow has been developed to streamline the generation and reporting of all quality control metrics from these tools. We have built an easy-to-use pipeline, named SCTK-QC, in the singleCellTK package that generates a comprehensive set of quality control metrics from a plethora of packages for quality control. We are able to import data from several preprocessing tools including CellRanger, STARSolo, BUSTools, dropEST, Optimus, and SEQC. Standard quality control metrics for each cell are calculated including the total number of UMIs, total number of genes detected, and the percentage of counts mapping to predefined gene sets such as mitochondrial genes. Doublet detection algorithms employed include scrublet, scds, doubletCells, and doubletFinder. DecontX is used to identify contamination in each individual cell. To make the data accessible in downstream analysis workflows, the results can be exported to common data structures in R and Python or to text files for use in any generic workflow. Overall, this pipeline will streamline and standardize quality control analyses for single cell RNA-seq data across different platforms. ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 522 times
  • Download rankings, all-time:
    • Site-wide: 52,149
    • In bioinformatics: 5,337
  • Year to date:
    • Site-wide: 14,601
  • Since beginning of last month:
    • Site-wide: 33,639

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide