Rxivist logo

Pseudoreplication bias in single-cell studies; a practical solution

By Kip D. Zimmerman, Mark A. Espeland, Carl D. Langefeld

Posted 15 Jan 2020
bioRxiv DOI: 10.1101/2020.01.15.906248

Cells from the same individual share a common genetic and environmental background and are not independent, therefore they are subsamples or pseudoreplicates. Thus, single-cell data have a hierarchical structure that many current single-cell methods do not address, leading to biased inference, highly inflated type 1 error rates, and reduced robustness and reproducibility. This includes methods that use a batch effect correction for individual as a means of accounting for within sample correlation. Here, we document this dependence across a range of cell types and show that "pseudo-bulk" aggregation methods are overly conservative and underpowered relative to mixed models. We propose applying two-part hurdle generalized linear mixed models with a random effect for individual to properly account for both zero inflation and the correlation structure among measures from cells within an individual. Finally, we provide power estimates across a range of experimental conditions to assist researchers in designing appropriately powered studies. ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 991 times
  • Download rankings, all-time:
    • Site-wide: 18,813
    • In bioinformatics: 2,325
  • Year to date:
    • Site-wide: 3,689
  • Since beginning of last month:
    • Site-wide: 21,319

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)