Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 67,354 bioRxiv papers from 296,792 authors.

Saturating Single-Cell Datasets

By Aparna Bhaduri, Tomasz J. Nowakowski, Alex A Pollen, Arnold R Kriegstein

Posted 12 Nov 2017
bioRxiv DOI: 10.1101/218370

High throughput methods for profiling the transcriptomes of single cells have recently emerged as transformative approaches for large-scale population surveys of cellular diversity in heterogeneous primary tissues. Efficient generation of such an atlas will depend on sufficient sampling of the diverse cell types while remaining cost-effective to enable a comprehensive examination of organs, developmental stages, and individuals. To examine the relationship between cell number and transcriptional heterogeneity in the context of unbiased cell type classification, we explicitly explored the population structure of a publically available 1.3 million cell dataset from the E18.5 mouse brain. We propose a computational framework for inferring the saturation point of cluster discovery in a single cell mRNA-seq experiment, centered around cluster preservation in downsampled datasets. In addition, we introduce a complexity index, which characterizes the heterogeneity of cells in a given dataset. Using Cajal-Retzius cells as an example of a limited complexity dataset, we explored whether biological distinctions relate to technical clustering. Surprisingly, we found that clustering distinctions carrying biologically interpretable meaning are achieved with far fewer cells (20,000). Together, these findings suggest that most of the biologically interpretable insights from the 1.3 million cells can be recapitulated by analyzing 50,000 randomly selected cells, indicating that instead of profiling few individuals at high cellular coverage, the much anticipated cell atlasing studies may instead benefit from profiling more individuals, or many time points at lower cellular coverage.

Download data

  • Downloaded 1,479 times
  • Download rankings, all-time:
    • Site-wide: 4,084 out of 67,403
    • In bioinformatics: 805 out of 6,642
  • Year to date:
    • Site-wide: 15,268 out of 67,403
  • Since beginning of last month:
    • Site-wide: 14,056 out of 67,403

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News