Rxivist logo

Adversarial domain translation networks enable fast and accurate large-scale atlas-level single-cell data integration

By Jia Zhao, Gefei Wang, Jingsi Ming, Zhixiang Lin, Yang Wang, Tabula Microcebus Consortium, Angela Wu, Can Yang

Posted 19 Nov 2021
bioRxiv DOI: 10.1101/2021.11.16.468892

The rapid emergence of large-scale atlas-level single-cell RNA-sequencing (scRNA-seq) datasets from various sources presents remarkable opportunities for broad and deep biological investigations through integrative analyses. However, harmonizing such datasets requires integration approaches to be not only computationally scalable, but also capable of preserving a wide range of fine-grained cell populations. We created Portal, a unified framework of adversarial domain translation to learn harmonized representations of datasets. With innovation in model and algorithm designs, Portal achieves superior performance in preserving biological variation during integration, while having significantly reduced running time and memory compared to existing approaches, achieving integration of millions of cells in minutes with low memory consumption. We demonstrate the efficiency and accuracy of Portal using diverse datasets ranging from mouse brain atlas projects, the Tabula Muris project, and the Tabula Microcebus project. Portal has broad applicability and in addition to integrating multiple scRNA-seq datasets, it can also integrate scRNA-seq with single-nucleus RNA-sequencing (snRNA-seq) data. Finally, we demonstrate the utility of Portal by applying it to the integration of cross-species datasets with limited shared-information between them, and are able to elucidate biological insights into the similarities and divergences in the spermatogenesis process between mouse, macaque, and human.

Download data

  • Downloaded 307 times
  • Download rankings, all-time:
    • Site-wide: 119,035
    • In bioinformatics: 9,716
  • Year to date:
    • Site-wide: 20,423
  • Since beginning of last month:
    • Site-wide: 5,632

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide