Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 73,960 bioRxiv papers from 321,862 authors.
Imputation of single-cell gene expression with an autoencoder neural network
Background Single-cell RNA-sequencing (scRNA-seq) is a rapidly evolving technology that enables measurement of gene expression levels at an unprecedented resolution. Despite the explosive growth in the number of cells that can be assayed by a single experiment, scRNA-seq still has several limitations, including high rates of dropouts, which result in a large number of genes having zero read count in the scRNA-seq data, and complicate downstream analyses. Methods To overcome this problem, we treat zeros as missing values and develop nonparametric deep learning methods for imputation. Specifically, our LATE (Learning with AuToEncoder) method trains an autoencoder with random initial values of the parameters, whereas our TRANSLATE (TRANSfer learning with LATE) method further allows for the use of a reference gene expression data set to provide LATE with an initial set of parameter estimates. Results On both simulated and real data, LATE and TRANSLATE outperform existing scRNA-seq imputation methods, achieving lower mean squared error in most cases, recovering nonlinear gene-gene relationships, and better separating cell types. They are also highly scalable and can efficiently process over 1 million cells in just a few hours on a GPU. Conclusions We demonstrate that our nonparametric approach to imputation based on autoencoders is powerful and highly efficient. * Adam : Adaptive Moment estimation ALRA : Adaptively-thresholded Low-Rank Approximation BCSS : Between-Cluster Sum of Squares CPU : Central Processing Unit DCA : Deep Count Autoencoder GB : Gigabyte GPU : Graphics Processing Unit GTEx : Genotype-Tissue Expression gtMSE : Mean Squared Error comparing with the ground truth gtMSEall : Mean Squared Error comparing with the ground truth on all values gtMSEnz : Mean Squared Error comparing with the ground truth only on nonzero values LATE : Learning with AuToEncoder MAGIC : Markov Affinity-based Graph Imputation of Cells MSE : Mean Squared Error PBMC : Peripheral Blood Mononuclear Cell PC : Principal Component PCA : Principal Component Analysis RAM : Random Access Memory ReLU : Rectified Linear Unit SAVER : Single-cell Analysis Via Expression Recovery scRNA-seq : single-cell RNA-sequencing scVI : single-cell Variational Inference SVD : Singular Value Decomposition TB : Terabyte TRANSLATE : TRANSfer learning with LATE tSNE : t-distributed Stochastic Neighbor Embedding TSS : Total Sum of Squares WCSS : Within-Cluster Sum of Squares
- Downloaded 1,707 times
- Download rankings, all-time:
- Site-wide: 3,579 out of 74,003
- In bioinformatics: 703 out of 7,193
- Year to date:
- Site-wide: 4,248 out of 74,003
- Since beginning of last month:
- Site-wide: 4,248 out of 74,003
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!