Rxivist logo

A Bayesian nonparametric semi-supervised model for integration of multiple single-cell experiments

By Archit Verma, Barbara E. Engelhardt

Posted 15 Jan 2020
bioRxiv DOI: 10.1101/2020.01.14.906313

Joint analysis of multiple single cell RNA-sequencing (scRNA-seq) data is confounded by technical batch effects across experiments, biological or environmental variability across cells, and different capture processes across sequencing platforms. Manifold alignment is a principled, effective tool for integrating multiple data sets and controlling for confounding factors. We demonstrate that the semi-supervised t-distributed Gaussian process latent variable model (sstGPLVM), which projects the data onto a mixture of fixed and latent dimensions, can learn a unified low-dimensional embedding for multiple single cell experiments with minimal assumptions. We show the efficacy of the model as compared with state-of-the-art methods for single cell data integration on simulated data, pancreas cells from four sequencing technologies, induced pluripotent stem cells from male and female donors, and mouse brain cells from both spatial seqFISH+ and traditional scRNA-seq. Code and data is available at <https://github.com/architverma1/sc-manifold-alignment>

Download data

  • Downloaded 917 times
  • Download rankings, all-time:
    • Site-wide: 13,106 out of 89,211
    • In bioinformatics: 2,080 out of 8,418
  • Year to date:
    • Site-wide: 1,623 out of 89,211
  • Since beginning of last month:
    • Site-wide: 13,828 out of 89,211

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)