Rxivist logo

Rxivist.org combines preprints from bioRxiv.org with data from Twitter to help you find the papers being discussed in your field.
Currently indexing 83,609 bioRxiv papers from 360,279 authors.

Most downloaded bioRxiv papers, all time

Results 1 through 20 out of 7875

in category bioinformatics


1: Comparative analyses of SAR-CoV2 genomes from different geographical locations and other coronavirus family genomes reveals unique features potentially consequential to host-virus interaction and pathogenesis

Rahila Sardar, Deepshikha Satish et al.

79,294 downloads (posted 21 Mar 2020)

The ongoing pandemic of the coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV2). We have performed an integrated sequence-based analysis of SARS-CoV2 genomes from different geographical locations in order to identify its unique features absent in SARS-CoV and other related coronavirus family genomes, conferring unique infection, facilitation of transmission, virulence and immunogenic features to the virus. The phylogeny of the genomes yields some interesting results. Systematic gene level mutational analysis of the genomes has enabled us to identify several unique features of the SARS-CoV2 genome, which includes a unique mutation in the spike surface glycoprotein (A930V (24351C>T)) in the Indian SARS-CoV2, absent in other strains studied here. We have also predicted the impact of the mutations in the spike glycoprotein function and stability, using computational approach. To gain further insights into host responses to viral infection, we predict that antiviral host-miRNAs may be controlling the viral pathogenesis. Our analysis reveals nine host miRNAs which can potentially target SARS-CoV2 genes. Interestingly, the nine miRNAs do not have targets in SARS and MERS genomes. Also, hsa-miR-27b is the only unique miRNA which has a target gene in the Indian SARS-CoV2 genome. We also predicted immune epitopes in the genomes.


2: Single-cell RNA expression profiling of ACE2, the receptor of SARS-CoV-2

Yu Zhao, Zixian Zhao et al.

56,965 downloads (posted 26 Jan 2020)

A novel coronavirus SARS-CoV-2 was identified in Wuhan, Hubei Province, China in December of 2019. According to WHO report, this new coronavirus has resulted in 76,392 confirmed infections and 2,348 deaths in China by 22 February, 2020, with additional patients being identified in a rapidly growing number internationally. SARS-CoV-2 was reported to share the same receptor, Angiotensin-converting enzyme 2 (ACE2), with SARS-CoV. Here based on the public database and the state-of-the-art single-cell RNA-Seq technique, we a...


3: Opportunities And Obstacles For Deep Learning In Biology And Medicine

Travers Ching, Daniel S. Himmelstein et al.

52,622 downloads (posted 28 May 2017)

Deep learning, which describes a class of machine learning algorithms, has recently showed impressive results across a variety of domains. Biology and medicine are data rich, but the data are complex and often ill-understood. Problems of this nature may be particularly well-suited to deep learning techniques. We examine applications of deep learning to a variety of biomedical problems - patient classification, fundamental biological processes, and treatment of patients - and discuss whether deep learning will transform ...


4: Third-generation sequencing and the future of genomics

Hayan Lee, James Gurtowski et al.

30,997 downloads (posted 13 Apr 2016)

Third-generation long-range DNA sequencing and mapping technologies are creating a renaissance in high-quality genome sequencing. Unlike second-generation sequencing, which produces short reads a few hundred base-pairs long, third-generation single-molecule technologies generate over 10,000 bp reads or map over 100,000 bp molecules. We analyze how increased read lengths can be used to address long-standing problems in de novo genome assembly, structural variation analysis and haplotype phasing.


5: Salmon provides accurate, fast, and bias-aware transcript expression estimates using dual-phase inference

Rob Patro, Geet Duggal et al.

21,262 downloads (posted 27 Jun 2015)

We introduce Salmon, a new method for quantifying transcript abundance from RNA-seq reads that is highly-accurate and very fast. Salmon is the first transcriptome-wide quantifier to model and correct for fragment GC content bias, which we demonstrate substantially improves the accuracy of abundance estimates and the reliability of subsequent differential expression analysis compared to existing methods that do not account for these biases. Salmon achieves its speed and accuracy by combining a new dual-phase parallel inf...


6: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Michael I. Love, Wolfgang Huber et al.

18,945 downloads (posted 19 Feb 2014)

In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq data, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data. DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of the...


7: Moving beyond P values: Everyday data analysis with estimation plots

Joses Ho, Tayfun Tumkaya et al.

17,533 downloads (posted 26 Jul 2018)

Over the past 75 years, a number of statisticians have advised that the data-analysis method known as null-hypothesis significance testing (NHST) should be deprecated (Berkson, 1942; Halsey et al., 2015; Wasserstein et al., 2019). The limitations of NHST have been extensively discussed, with a broad consensus that current statistical practice in the biological sciences needs reform. However, there is less agreement on reform’s specific nature, with vigorous debate surrounding what would constitute a suitable alternative...


8: End-to-end differentiable learning of protein structure

Mohammed AlQuraishi

17,393 downloads (posted 14 Feb 2018)

Accurate prediction of protein structure is one of the central challenges of biochemistry. Despite significant progress made by co-evolution methods to predict protein structure from signatures of residue-residue coupling found in the evolutionary record, a direct and explicit mapping between protein sequence and structure remains elusive, with no substantial recent progress. Meanwhile, rapid developments in deep learning, which have found remarkable success in computer vision, natural language processing, and quantum c...


9: Content-Aware Image Restoration: Pushing the Limits of Fluorescence Microscopy

Martin Weigert, Deborah Schmidt et al.

17,290 downloads (posted 19 Dec 2017)

Fluorescence microscopy is a key driver of discoveries in the life-sciences, with observable phenomena being limited by the optics of the microscope, the chemistry of the fluorophores, and the maximum photon exposure tolerated by the sample. These limits necessitate trade-offs between imaging speed, spatial resolution, light exposure, and imaging depth. In this work we show how image restoration based on deep learning extends the range of biological phenomena observable by microscopy. On seven concrete examples we demon...


10: Evaluation of UMAP as an alternative to t-SNE for single-cell data

Etienne Becht, Charles-Antoine Dutertre et al.

16,283 downloads (posted 10 Apr 2018)

Uniform Manifold Approximation and Projection (UMAP) is a recently-published non-linear dimensionality reduction technique. Another such algorithm, t-SNE, has been the default method for such task in the past years. Herein we comment on the usefulness of UMAP high-dimensional cytometry and single-cell RNA sequencing, notably highlighting faster runtime and consistency, meaningful organization of cell clusters and preservation of continuums in UMAP compared to t-SNE.


11: A comparison of single-cell trajectory inference methods: towards more accurate and robust tools

Wouter Saelens, Robrecht Cannoodt et al.

15,280 downloads (posted 05 Mar 2018)

Using single-cell -omics data, it is now possible to computationally order cells along trajectories, allowing the unbiased study of cellular dynamic processes. Since 2014, more than 50 trajectory inference methods have been developed, each with its own set of methodological characteristics. As a result, choosing a method to infer trajectories is often challenging, since a comprehensive assessment of the performance and robustness of each method is still lacking. In order to facilitate the comparison of the results of th...


12: Visualizing Structure and Transitions for Biological Data Exploration

Kevin R Moon, David van Dijk et al.

14,837 downloads (posted 24 Mar 2017)

With the advent of high-throughput technologies measuring high-dimensional biological data, there is a pressing need for visualization tools that reveal the structure and emergent patterns of data in an intuitive form. We present PHATE, a visualization method that captures both local and global nonlinear structure in data by an information-geometric distance between datapoints. We perform extensive comparison between PHATE and other tools on a variety of artificial and biological datasets, and find that it consistently ...


13: DeepAD: Alzheimer′s Disease Classification via Deep Convolutional Neural Networks using MRI and fMRI

Saman Sarraf, Danielle D. DeSouza et al.

14,278 downloads (posted 21 Aug 2016)

To extract patterns from neuroimaging data, various statistical methods and machine learning algorithms have been explored for the diagnosis of Alzheimer′s disease among older adults in both clinical and research applications; however, distinguishing between Alzheimer′s and healthy brain data has been challenging in older adults (age > 75) due to highly similar patterns of brain atrophy and image intensities. Recently, cutting-edge deep learning technologies have rapidly expanded into numerous fields, including medical ...


14: Integrated analyses of single-cell atlases reveal age, gender, and smoking status associations with cell type-specific expression of mediators of SARS-CoV-2 viral entry and highlights inflammatory programs in putative target cells

Christoph Muus, Malte D Luecken et al.

14,124 downloads (posted 20 Apr 2020)

The COVID-19 pandemic, caused by the novel coronavirus SARS-CoV-2, creates an urgent need for identifying molecular mechanisms that mediate viral entry, propagation, and tissue pathology. Cell membrane bound angiotensin-converting enzyme 2 (ACE2) and associated proteases, transmembrane protease serine 2 (TMPRSS2) and Cathepsin L (CTSL), were previously identified as mediators of SARS-CoV2 cellular entry. Here, we assess the cell type-specific RNA expression of ACE2, TMPRSS2, and CTSL through an integrated analysis of 10...


15: Flexible analysis of transcriptome assemblies with Ballgown

Alyssa C Frazee, Geo Pertea et al.

13,860 downloads (posted 30 Mar 2014)

We have built a statistical package called Ballgown for estimating differential expression of genes, transcripts, or exons from RNA sequencing experiments. Ballgown is designed to work with the popular Cufflinks transcript assembly software and uses well-motivated statistical methods to provide estimates of changes in expression. It permits statistical analysis at the transcript level for a wide variety of experimental designs, allows adjustment for confounders, and handles studies with continuous covariates. Ballgown p...


16: MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data

David van Dijk, Juozas Nainys et al.

13,801 downloads (posted 25 Feb 2017)

Single-cell RNA-sequencing is fast becoming a major technology that is revolutionizing biological discovery in fields such as development, immunology and cancer. The ability to simultaneously measure thousands of genes at single cell resolution allows, among other prospects, for the possibility of learning gene regulatory networks at large scales. However, scRNA-seq technologies suffer from many sources of significant technical noise, the most prominent of which is dropout due to inefficient mRNA capture. This results i...


17: Potential inhibitors for 2019-nCoV coronavirus M protease from clinically approved medicines

Xin Liu, Xiu-Jie Wang

13,383 downloads (posted 29 Jan 2020)

Starting from December 2019, a novel coronavirus, later named 2019-nCoV, was found to cause severe and rapid pandemic in China. Basing on the structural information, we have predicted a list of commercial medicines which may function as inhibitors for 2019-nCoV by targeting its main protease Mpro. These drugs may also be effective for other coronaviruses with similar Mpro binding sites and pocket structures.


18: Ancestry Composition: A Novel, Efficient Pipeline for Ancestry Deconvolution

Eric Y Durand, Chuong B Do et al.

13,041 downloads (posted 18 Oct 2014)

Ancestry deconvolution, the task of identifying the ancestral origin of chromosomal segments in admixed individuals, has important implications, from mapping disease genes to identifying candidate loci under natural selection. To date, however, most existing methods for ancestry deconvolution are typically limited to two or three ancestral populations, and cannot resolve contributions from populations related at a sub-continental scale. We describe Ancestry Composition, a modular three-stage pipeline that efficiently an...


19: RapMap: A Rapid, Sensitive and Accurate Tool for Mapping RNA-seq Reads to Transcriptomes

Avi Srivastava, Hirak Sarkar et al.

12,552 downloads (posted 22 Oct 2015)

Motivation: The alignment of sequencing reads to a transcriptome is a common and important step in many RNA-seq analysis tasks. When aligning RNA-seq reads directly to a transcriptome (as is common in the de novo setting or when a trusted reference annotation is available), care must be taken to report the potentially large number of multi-mapping locations per read. This can pose a substantial computational burden for existing aligners, and can considerably slow downstream analysis. Results: We introduce a novel concep...


20: Privacy-preserving generative deep neural networks support clinical data sharing

Brett K. Beaulieu-Jones, Zhiwei Steven Wu et al.

12,291 downloads (posted 05 Jul 2017)

Background: Data sharing accelerates scientific progress but sharing individual level data while preserving patient privacy presents a barrier. Methods and Results: Using pairs of deep neural networks, we generated simulated, synthetic "participants" that closely resemble participants of the SPRINT trial. We showed that such paired networks can be trained with differential privacy, a formal privacy framework that limits the likelihood that queries of the synthetic participants' data could identify a real a participant i...