cFIT: Integration and transfer learning of single cell transcriptomes, illustrated by fetal brain cell development
By
Minshi Peng,
Yue Li,
Brie Wamsley,
Yuting Wei,
Kathryn Roeder
Posted 31 Aug 2020
bioRxiv DOI: 10.1101/2020.08.31.276345
Large, comprehensive collections of scRNA-seq data sets have been generated that allow for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these data sets or transfer knowledge from one to the other to better understand cellular identity and functions. Here, we present a simple yet surprisingly effective method named cFIT for capturing various batch effects across experiments, technologies, subjects, and even species. The proposed method models the shared information between various data sets by a common factor space, while allowing for unique distortions and shifts in gene-wise expression in each batch. The model parameters are learned under an iterative non-negative matrix factorization (NMF) framework and then used for synchronized integration from across-domain assays. In addition, the model enables transferring via low-rank matrix from more informative data to allow for precise identification in data of lower quality. Compared to existing approaches, our method imposes weaker assumptions on the cell composition of each individual data set, however, is shown to be more reliable in preserving biological variations. We apply cFIT to multiple scRNA-seq data sets of developing brain from human and mouse, varying by technologies and developmental stages. The successful integration and transfer uncover the transcriptional resemblance across systems. The study helps establish a comprehensive landscape of brain cell type diversity and provides insights into brain development.
Download data
- Downloaded 692 times
- Download rankings, all-time:
- Site-wide: 69,245
- In genomics: 4,681
- Year to date:
- Site-wide: 167,397
- Since beginning of last month:
- Site-wide: 102,936
Altmetric data
Downloads over time
Distribution of downloads per paper, site-wide
PanLingua
News
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!