Rxivist logo

Single-cell transcriptome profiling simulation reveals the impact of sequencing parameters and algorithms on clustering

By Yunhe Liu, Bisheng Shi, Aoshen Wu, Xueqing Peng, Zhenghong Yuan, Gang Liu, Lei Liu

Posted 16 Mar 2021
bioRxiv DOI: 10.1101/2021.03.16.435626

Despite of scRNA-seq analytic algorithms developed, their performance for cell clustering cannot be quantified due to the unknown "true" clusters. Referencing the transcriptomic heterogeneity of cell clusters, a "true" mRNA number matrix of cell individuals was defined as ground truth. Based on the matrix and real data generation procedure, a simulation program (SSCRNA) for raw data was developed. Subsequently, the consistence between simulated data and real data was evaluated. Furthermore, the impact of sequencing depth, and algorithms for analyses on cluster accuracy was quantified. As a result, the simulation result is highly consistent with that of the real data. It is found that mis-classification rate can be attributed to multiple reasons on current scRNA platforms, and clustering accuracy is not only sensitive to sequencing depth increasement, but can also be reflected by the position of the cluster on TSNE plot. Among the clustering algorithms, Gaussian normalization method is more appropriate for current workflows. In the clustering algorithms, k-means&louvain clustering method performs better in dimension reduced data than full data, while k-means clustering method is stable under both situations. In conclusion, the scRNA simulation algorithm developed restores the real data generation process, discovered impact of parameters on mis-clustering, compared the normalization/clustering algorithms and provided novel insight into scRNA analyses.

Download data

  • Downloaded 163 times
  • Download rankings, all-time:
    • Site-wide: 142,581
    • In bioinformatics: 11,013
  • Year to date:
    • Site-wide: 57,065
  • Since beginning of last month:
    • Site-wide: 106,874

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide