Rxivist logo

A Comprehensive Multi-Center Cross-platform Benchmarking Study of Single-cell RNA Sequencing Using Reference Samples

By Wanqiu Chen, Yongmei Zhao, Xin Chen, Xiaojiang Xu, Zhaowei Yang, Yingtao Bi, Vicky Chen, Jing Li, Hannah Choi, Ben Ernest, Bao Tran, Monika Mehta, Malcolm Moos, Andrew Farmer, Alain Mir, Parimal Kumar, Urvashi Mehra, Jian-Liang Li, Wenming Xiao, Charles Wang

Posted 29 Mar 2020
bioRxiv DOI: 10.1101/2020.03.27.010249

Single-cell RNA sequencing (scRNA-seq) has become a very powerful technology for biomedical research and is becoming much more affordable as methods continue to evolve, but it is unknown how reproducible different platforms are using different bioinformatics pipelines, particularly the recently developed scRNA-seq batch correction algorithms. We carried out a comprehensive multi-center cross-platform comparison on different scRNA-seq platforms using standard reference samples. We compared six preprocessing pipelines, seven bioinformatics normalization procedures, and seven batch effect correction methods including CCA, MNN, Scanorama, BBKNN, Harmony, limma and ComBat to evaluate the performance and reproducibility of 20 scRNA-seq datasets derived from four different platforms and centers. We benchmarked scRNA-seq performance across different platforms and testing sites using global gene expression profiles as well as some cell-type specific marker genes. We showed that there were large batch effects; and the reproducibility of scRNA-seq across platforms was dictated both by the expression level of genes selected and the batch correction methods used. We found that CCA, MNN, and BBKNN all corrected the batch variations fairly well for the scRNA-seq data derived from biologically similar samples across platforms/sites. However, for the scRNA-seq data derived from or consisting of biologically distinct samples, limma and ComBat failed to correct batch effects, whereas CCA over-corrected the batch effect and misclassified the cell types and samples. In contrast, MNN, Harmony and BBKNN separated biologically different samples/cell types into correspondingly distinct dimensional subspaces; however, consistent with this algorithm's logic, MNN required that the samples evaluated each contain a shared portion of highly similar cells. In summary, we found a great cross-platform consistency in separating two distinct samples when an appropriate batch correction method was used. We hope this large cross-platform/site scRNA-seq data set will provide a valuable resource, and that our findings will offer useful advice for the single-cell sequencing community.

Download data

  • Downloaded 1,400 times
  • Download rankings, all-time:
    • Site-wide: 17,871
    • In genomics: 1,676
  • Year to date:
    • Site-wide: 49,770
  • Since beginning of last month:
    • Site-wide: 84,093

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide