Rxivist logo

Saturating Single-Cell Datasets

By Aparna Bhaduri, Tomasz J. Nowakowski, Alex A. Pollen, Arnold R. Kriegstein

Posted 12 Nov 2017
bioRxiv DOI: 10.1101/218370

High throughput methods for profiling the transcriptomes of single cells have recently emerged as transformative approaches for large-scale population surveys of cellular diversity in heterogeneous primary tissues. Efficient generation of such an atlas will depend on sufficient sampling of the diverse cell types while remaining cost-effective to enable a comprehensive examination of organs, developmental stages, and individuals. To examine the relationship between cell number and transcriptional heterogeneity in the context of unbiased cell type classification, we explicitly explored the population structure of a publically available 1.3 million cell dataset from the E18.5 mouse brain. We propose a computational framework for inferring the saturation point of cluster discovery in a single cell mRNA-seq experiment, centered around cluster preservation in downsampled datasets. In addition, we introduce a complexity index, which characterizes the heterogeneity of cells in a given dataset. Using Cajal-Retzius cells as an example of a limited complexity dataset, we explored whether biological distinctions relate to technical clustering. Surprisingly, we found that clustering distinctions carrying biologically interpretable meaning are achieved with far fewer cells (20,000). Together, these findings suggest that most of the biologically interpretable insights from the 1.3 million cells can be recapitulated by analyzing 50,000 randomly selected cells, indicating that instead of profiling few individuals at high cellular coverage, the much anticipated cell atlasing studies may instead benefit from profiling more individuals, or many time points at lower cellular coverage.

Download data

  • Downloaded 1,638 times
  • Download rankings, all-time:
    • Site-wide: 5,103 out of 89,581
    • In bioinformatics: 927 out of 8,443
  • Year to date:
    • Site-wide: 25,666 out of 89,581
  • Since beginning of last month:
    • Site-wide: 50,575 out of 89,581

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)