Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 73,582 bioRxiv papers from 320,155 authors.
Most downloaded bioRxiv papers, all time
72,293 results found. For more information, click each entry to expand.
5,655 downloads neuroscience
Deep neural networks (DNNs) have recently been applied successfully to brain decoding and image reconstruction from functional magnetic resonance imaging (fMRI) activity. However, direct training of a DNN with fMRI data is often avoided because the size of available data is thought to be insufficient to train a complex network with numerous parameters. Instead, a pre-trained DNN has served as a proxy for hierarchical visual representations, and fMRI data were used to decode individual DNN features of a stimulus image using a simple linear model, which were then passed to a reconstruction module. Here, we present our attempt to directly train a DNN model with fMRI data and the corresponding stimulus images to build an end-to-end reconstruction model. We trained a generative adversarial network with an additional loss term defined in a high-level feature space (feature loss) using up to 6,000 training data points (natural images and the fMRI responses). The trained deep generator network was tested on an independent dataset, directly producing a reconstructed image given an fMRI pattern as the input. The reconstructions obtained from the proposed method showed resemblance with both natural and artificial test stimuli. The accuracy increased as a function of the training data size, though not outperforming the decoded feature-based method with the available data size. Ablation analyses indicated that the feature loss played a critical role to achieve accurate reconstruction. Our results suggest a potential for the end-to-end framework to learn a direct mapping between brain activity and perception given even larger datasets.
5,649 downloads bioinformatics
Motivation: Large-scale gene expression profiling has been widely used to characterize cellular states in response to various disease conditions, genetic perturbations, etc. Although the cost of whole-genome expression profiles has been dropping steadily, generating a compendium of expression profiling over thousands of samples is still very expensive. Recognizing that gene expressions are often highly correlated, researchers from the NIH LINCS program have developed a cost- effective strategy of profiling only ̃1,000 carefully selected landmark genes and relying on computational methods to infer the expression of remaining target genes. However, the computational approach adopted by the LINCS program is currently based on linear regression, limiting its accuracy since it does not capture complex nonlinear relationship between expression of genes. Results: We present a deep learning method (abbreviated as D-GEX) to infer the expression of target genes from the expression of landmark genes. We used the microarray-based GEO dataset, consisting of 111K expression profiles, to train our model and compare its performance to those from other methods. In terms of mean absolute error averaged across all genes, deep learning significantly outperforms linear regression with 15.33% relative improvement. A gene-wise comparative analysis shows that deep learning achieves lower error than linear regression in 99.97% of the target genes. We also tested the performance of our learned model on an independent RNA-Seq-based GTEx dataset, which consists of 2,921 expression profiles. Deep learning still outperforms linear regression with 6.57% relative improvement, and achieves lower error in 81.31% of the target genes. Availability: D-GEX is available at https://github.com/uci-cbcl/D-GEX.
5,647 downloads genomics
Martin Sikora, Vladimir V. Pitulko, Vitor C. Sousa, Morten E. Allentoft, Lasse Vinner, Simon Rasmussen, Ashot Margaryan, Peter de Barros Damgaard, Constanza de la Fuente Castro, Gabriel Renaud, Melinda Yang, Qiaomei Fu, Isabelle Dupanloup, Konstantinos Giampoudakis, David Bravo Nogues, Carsten Rahbek, Guus Kroonen, Michäel Peyrot, Hugh McColl, Sergey V. Vasilyev, Elizaveta Veselovskaya, Margarita Gerasimova, Elena Y. Pavlova, Vyacheslav G. Chasnyk, Pavel A. Nikolskiy, Pavel S. Grebenyuk, Alexander Yu. Fedorchenko, Alexander I. Lebedintsev, Sergey B. Slobodin, Boris A. Malyarchuk, Rui Martiniano, Morten Meldgaard, Laura Arppe, Jukka U. Palo, Tarja Sundell, Kristiina Mannermaa, Mikko Putkonen, Verner Alexandersen, Charlotte Primeau, Ripan Mahli, Karl-Göran Sjögren, Kristian Kristiansen, Anna Wessman, Antti Sajantila, Marta Mirazon Lahr, Richard Durbin, Rasmus Nielsen, David J. Meltzer, Laurent Excoffier, Eske Willerslev
Far northeastern Siberia has been occupied by humans for more than 40 thousand years. Yet, owing to a scarcity of early archaeological sites and human remains, its population history and relationship to ancient and modern populations across Eurasia and the Americas are poorly understood. Here, we report 34 ancient genome sequences, including two from fragmented milk teeth found at the ~31.6 thousand-year-old (kya) Yana RHS site, the earliest and northernmost Pleistocene human remains found. These genomes reveal complex patterns of past population admixture and replacement events throughout northeastern Siberia, with evidence for at least three large-scale human migrations into the region. The first inhabitants, a previously unknown population of "Ancient North Siberians" (ANS), represented by Yana RHS, diverged ~38 kya from Western Eurasians, soon after the latter split from East Asians. Between 20 and 11 kya, the ANS population was largely replaced by peoples with ancestry from East Asia, giving rise to ancestral Native Americans and "Ancient Paleosiberians" (AP), represented by a 9.8 kya skeleton from Kolyma River. AP are closely related to the Siberian ancestors of Native Americans, and ancestral to contemporary communities such as Koryaks and Itelmen. Paleoclimatic modelling shows evidence for a refuge during the last glacial maximum (LGM) in southeastern Beringia, suggesting Beringia as a possible location for the admixture forming both ancestral Native Americans and AP. Between 11 and 4 kya, AP were in turn largely replaced by another group of peoples with ancestry from East Asia, the "Neosiberians" from which many contemporary Siberians derive. We detect additional gene flow events in both directions across the Bering Strait during this time, influencing the genetic composition of Inuit, as well as Na Dene-speaking Northern Native Americans, whose Siberian-related ancestry components is closely related to AP. Our analyses reveal that the population history of northeastern Siberia was highly dynamic, starting in the Late Pleistocene and continuing well into the Late Holocene. The pattern observed in northeastern Siberia, with earlier, once widespread populations being replaced by distinct peoples, seems to have taken place across northern Eurasia, as far west as Scandinavia.
5,642 downloads biophysics
The acquisition of cryo-electron microscopy (cryo-EM) data from biological specimens is currently largely uncoupled from subsequent data evaluation, correction and processing. Therefore, the acquisition strategy is difficult to optimize during data collection, often leading to suboptimal microscope usage and disappointing results. Here we provide Warp, a software for real-time evaluation, correction, and processing of cryo-EM data during their acquisition. Warp evaluates and monitors key parameters for each recorded micrograph or tomographic tilt series in real time. Warp also rapidly corrects micrographs for global and local motion, and estimates the local defocus with the use of novel algorithms. The software further includes a deep learning-based particle picking algorithm that rivals human accuracy to make the pre-processing pipeline truly automated. The output from Warp can be directly fed into established tools for particle classification and 3D image reconstruction. In a benchmarking study we show that Warp automatically processed a published cryo-EM data set for influenza virus hemagglutinin, leading to an improvement of the nominal resolution from 3.9 Å to 3.2 Å. Warp is easy to install, computationally inexpensive, and has an intuitive and streamlined user interface.
5,638 downloads synthetic biology
Alejandro Chavez, Jonathan Scheiman, Suhani Vora, Benjamin W Pruitt, Marcelle Tuttle, Eswar Iyer, Samira Kiani, Christopher D Guzman, Daniel J Wiegand, Dimtry Ter-Ovanesyan, Jonathan L Braff, Noah Davidsohn, Ron Weiss, John Aach, James J. Collins, George M. Church
The RNA-guided bacterial nuclease Cas9 can be reengineered as a programmable transcription factor by a series of changes to the Cas9 protein in addition to the fusion of a transcriptional activation domain (AD). However, the modest levels of gene activation achieved by current Cas9 activators have limited their potential applications. Here we describe the development of an improved transcriptional regulator through the rational design of a tripartite activator, VP64-p65-Rta (VPR), fused to Cas9. We demonstrate its utility in activating expression of endogenous coding and non-coding genes, targeting several genes simultaneously and stimulating neuronal differentiation of induced pluripotent stem cells (iPSCs).
5,637 downloads neuroscience
In response to reports of inflated false positive rate (FPR) in FMRI group analysis tools, a series of replications, investigations, and software modifications were made to address this issue. While these investigations continue, significant progress has been made to adapt AFNI to fix such problems. Two separate lines of changes have been made. First, a long-tailed model for the spatial correlation of the FMRI noise characterized by autocorrelation function (ACF) was developed and implemented into the 3dClustSim tool for determining the cluster-size threshold to use for a given voxel-wise threshold. Second, the 3dttest++ program was modified to do randomization of the voxel-wise t-tests and then to feed those randomized t-statistic maps into 3dClustSim directly for cluster-size threshold determination﹣without any spatial model for the ACF. These approaches were tested with the Beijing subset of the FCON-1000 data collection. The first approach shows markedly improved (reduced) FPR, but in many cases is still above the nominal 5%. The second approach shows FPRs clustered tightly about 5% across all per-voxel p-value thresholds ≤ 0.01. If t-tests from a univariate GLM are adequate for the group analysis in question, the second approach is what the AFNI group currently recommends for thresholding. If more complex per-voxel statistical analyses are required (where permutation/randomization is impracticable), then our current recommendation is to use the new ACF modeling approach coupled with a per-voxel p-threshold of 0.001 or below. Simulations were also repeated with the now infamously "buggy" version of 3dClustSim: the effect of the bug on FPRs was minimal (of order a few percent).
5,625 downloads genomics
Mandeep Singh, Ghamdan Al-Eryani, Shaun Carswell, James M. Ferguson, James Blackburn, Kirston Barton, Daniel Roden, Fabio Luciani, Tri Phan, Simon Junankar, Katherine Jackson, Christopher C. Goodnow, Martin A. Smith, Alexander Swarbrick
High-throughput single-cell RNA-Sequencing is a powerful technique for gene expression profiling of complex and heterogeneous cellular populations such as the immune system. However, these methods only provide short-read sequence from one end of a cDNA template, making them poorly suited to the investigation of gene-regulatory events such as mRNA splicing, adaptive immune responses or somatic genome evolution. To address this challenge, we have developed a method that combines targeted long-read sequencing with short-read based transcriptome profiling of barcoded single cell libraries generated by droplet-based partitioning. We use Repertoire And Gene Expression sequencing (RAGE-seq) to accurately characterize full-length T cell (TCR) and B cell (BCR) receptor sequences and transcriptional profiles of more than 7,138 lymphocytes sampled from the primary tumour and draining lymph node of a breast cancer patient. With this method we show that somatic mutation, alternate splicing and clonal evolution of T and B lymphocytes can be tracked across these tissue compartments. Our results demonstrate that RAGE-Seq is an accessible and cost-effective method for high-throughput deep single cell profiling, applicable to a wide range of biological challenges.
5,624 downloads cancer biology
Cancer cell lines are often used in laboratory experiments as models of tumors, although they can have substantially different genetic and epigenetic profiles compared to tumors. We have developed a general computational method, TumorComparer, to systematically quantify similarities and differences between tumor material when detailed genetic and molecular profiles are available. The comparisons can be flexibly tailored to a particular biological question by placing a higher weight on functional alterations of interest (weighted similarity). In a first pan-cancer application, we have compared 260 cell lines from the Cancer Cell Line Encyclopaedia (CCLE) and 1914 tumors of six different cancer types from The Cancer Genome Atlas (TCGA), using weights to emphasize genomic alterations that frequently recur in tumors. We report the potential suitability of particular cell lines as tumor models and identify apparently unsuitable outlier cell lines, some of which are in wide use, for each of the six cancer types. In future, this weighted similarity method may be generalized for use in a clinical setting to compare patient profiles consisting of genomic patterns combined with clinical attributes, such as diagnosis, treatment and response to therapy.
5,618 downloads bioinformatics
Mingxun Wang, Alan K. Jarmusch, Fernando Vargas, Alexander A. Aksenov, Julia M. Gauglitz, Kelly Weldon, Daniel Petras, Ricardo da Silva, Robby Quinn, Alexey V. Melnik, Justin J.J. van der Hooft, Andrés Mauricio Caraballo Rodríguez, Louis Felix Nothias, Christine M. Aceves, Morgan Panitchpakdi, Elizabeth Brown, Francesca Di Ottavio, Nicole Sikora, Emmanuel O. Elijah, Lara Labarta-Bajo, Emily C. Gentry, Shabnam Shalapour, Kathleen E. Kyle, Sara P. Puckett, Jeramie D. Watrous, Carolina S. Carpenter, Amina Bouslimani, Madeleine Ernst, Austin D. Swafford, Elina I. Zúñiga, Marcy J. Balunas, Jonathan L. Klassen, Rohit Loomba, Rob Knight, Nuno Bandeira, Pieter C. Dorrestein
We introduce a web-enabled small-molecule mass spectrometry (MS) search engine. To date, no tool can query all the public small-molecule tandem MS data in metabolomics repositories, greatly limiting the utility of these resources in clinical, environmental and natural product applications. Therefore, we introduce a Mass Spectrometry Search Tool (MASST) (https://proteosafe-extensions.ucsd.edu/masst/), that enables the discovery of molecular relationships among accessible public metabolomics and natural product tandem mass spectrometry data (MS/MS).
5,605 downloads genomics
Sofia A Quinodoz, Noah Ollikainen, Barbara Tabak, Ali Palla, Jan Marten Schmidt, Elizabeth Detmar, Mason Lai, Alexander Shishkin, Prashant Bhat, Vickie Trinh, Erik Aznauryan, Pamela Russell, Christine Cheng, Marko Jovanovic, Amy Chow, Patrick McDonel, Manuel Garber, Mitchell Guttman
Eukaryotic genomes are packaged into a 3-dimensional structure in the nucleus of each cell. There are currently two distinct views of genome organization that are derived from different technologies. The first view, derived from genome-wide proximity ligation methods (e.g. Hi-C), suggests that genome organization is largely organized around chromosomes. The second view, derived from in situ imaging, suggests a central role for nuclear bodies. Yet, because microscopy and proximity-ligation methods measure different aspects of genome organization, these two views remain poorly reconciled and our overall understanding of how genomic DNA is organized within the nucleus remains incomplete. Here, we develop Split-Pool Recognition of Interactions by Tag Extension (SPRITE), which moves away from proximity-ligation and enables genome-wide detection of higher-order DNA interactions within the nucleus. Using SPRITE, we recapitulate known genome structures identified by Hi-C and show that the contact frequencies measured by SPRITE strongly correlate with the 3-dimensional distances measured by microscopy. In addition to known structures, SPRITE identifies two major hubs of inter-chromosomal interactions that are spatially arranged around the nucleolus and nuclear speckles, respectively. We find that the majority of genomic regions exhibit preferential spatial association relative to one of these nuclear bodies, with regions that are highly transcribed by RNA Polymerase II organizing around nuclear speckles and transcriptionally inactive and centromere-proximal regions organizing around the nucleolus. Together, our results reconcile the two distinct pictures of nuclear structure and demonstrate that nuclear bodies act as inter-chromosomal hubs that shape the overall 3-dimensional packaging of genomic DNA in the nucleus.
5,579 downloads genomics
Cannabis has been cultivated for millennia with distinct cultivars providing either fiber and grain or tetrahydrocannabinol. Recent demand for cannabidiol rather than tetrahydrocannabinol has favored the breeding of admixed cultivars with extremely high cannabidiol content. Despite several draft Cannabis genomes, the genomic structure of cannabinoid synthase loci has remained elusive. A genetic map derived from a tetrahydrocannabinol/cannabidiol segregating population and a complete chromosome assembly from a high-cannabidiol cultivar together resolve the linkage of cannabidiolic and tetrahydrocannabinolic acid synthase gene clusters which are associated with transposable elements. High-cannabidiol cultivars appear to have been generated by integrating hemp-type cannabidiolic acid synthase gene clusters into a background of marijuana-type cannabis. Quantitative trait locus mapping suggests that overall drug potency, however, is associated with other genomic regions needing additional study.
5,573 downloads bioengineering
Our previous publication suggested CRISPR-Cas9 editing at the zygotic stage might unexpectedly introduce a multitude of subtle but unintended mutations, an interpretation that not surprisingly raised numerous questions. The key issue is that since parental lines were not available, might the reported variants have been inherited? To expand upon the limited available whole genome data on whether CRISPR-edited mice show more genetic variation, whole-genome sequencing was performed on two other mouse lines that had undergone a CRISPR-editing procedure. Again, parents were not available for either the Capn5 nor Fblim1 CRISPR-edited mouse lines, so strain controls were examined. Additionally, we also include verification of variants detected in the initial mouse line. Taken together, these whole-genome-sequencing-level results support the idea that in specific cases, CRISPR-Cas9 editing can precisely edit the genome at the organismal level and may not introduce numerous, unintended, off-target mutations.
5,566 downloads neuroscience
Julie A Harris, Stefan Mihalas, Karla E Hirokawa, Jennifer D. Whitesell, Joseph E. Knox, Amy Bernard, Phillip Bohn, Shiella Caldejon, Linzy Casal, Andrew Cho, David Feng, Nathalie Gaudreault, Charles R. Gerfen, Nile Graddis, Peter A. Groblewski, Alex Henry, Anh Ho, Robert Howard, Leonard Kuan, Jerome Lecoq, Jennifer Luviano, Stephen McConoghy, Marty T. Mortrud, Maitham Naeemi, Lydia Ng, Seung W Oh, Benjamin Ouellette, Staci A. Sorensen, Wayne Wakeman, Quanxin Wang, Ali Williford, John W Phillips, Allan Jones, Christof Koch, Hongkui Zeng
The mammalian cortex is a laminar structure composed of many cell types densely interconnected in complex ways. Recent systematic efforts to map the mouse mesoscale connectome provide comprehensive projection data on interareal connections, but not at the level of specific cell classes or layers within cortical areas. We present here a significant expansion of the Allen Mouse Brain Connectivity Atlas, with ~1,000 new axonal projection mapping experiments across nearly all isocortical areas in 49 Cre driver lines. Using 13 lines selective for cortical layer-specific projection neuron classes, we identify the differential contribution of each layer/class to the overall intracortical connectivity patterns. We find layer 5 (L5) projection neurons account for essentially all intracortical outputs. L2/3, L4, and L6 neurons contact a subset of the L5 cortical targets. We also describe the most common axon lamination patterns in cortical targets. Most patterns are consistent with previous anatomical rules used to determine hierarchical position between cortical areas (feedforward, feedback), with notable exceptions. While diverse target lamination patterns arise from every source layer/class, L2/3 and L4 neurons are primarily associated with feedforward type projection patterns and L6 with feedback. L5 has both feedforward and feedback projection patterns. Finally, network analyses revealed a modular organization of the intracortical connectome. By labeling interareal and intermodule connections as feedforward or feedback, we present an integrated view of the intracortical connectome as a hierarchical network.
5,540 downloads bioinformatics
Single-cell RNA sequencing (scRNA-seq) has enabled researchers to study gene expression at a cellular resolution. However, noise due to amplification and dropout may obstruct analyses, so scalable denoising methods for increasingly large but sparse scRNAseq data are needed. We propose a deep count autoencoder network (DCA) to denoise scRNA-seq datasets. DCA takes the count distribution, overdispersion and sparsity of the data into account using a zero-inflated negative binomial noise model, and nonlinear gene-gene or gene-dispersion interactions are captured. Our method scales linearly with the number of cells and can therefore be applied to datasets of millions of cells. We demonstrate that DCA denoising improves a diverse set of typical scRNA-seq data analyses using simulated and real datasets. DCA outperforms existing methods for data imputation in quality and speed, enhancing biological discovery.
5,536 downloads genetics
Polygenic risk scores (PRS) are poised to improve biomedical outcomes via precision medicine. However, the major ethical and scientific challenge surrounding clinical implementation is that they are many-fold more accurate in European ancestry individuals than others. This disparity is an inescapable consequence of Eurocentric genome-wide association study biases. This highlights that--unlike clinical biomarkers and prescription drugs, which may individually work better in some populations but do not ubiquitously perform far better in European populations--clinical uses of PRS today would systematically afford greater improvement to European descent populations. Early diversifying efforts show promise in levelling this vast imbalance, even when non-European sample sizes are considerably smaller than the largest studies to date. To realize the full and equitable potential of PRS, we must prioritize greater diversity in genetic studies and public dissemination of summary statistics to ensure that health disparities are not increased for those already most underserved.
5,525 downloads developmental biology
Ricard Argelaguet, Hisham Mohammed, Stephen J Clark, L Carine Stapel, Christel Krueger, Chantriolnt-Andreas Kapourani, Yunlong Xiang, Courtney Hanna, Sebastien Smallwood, Ximena Ibarra-Soria, Florian Buettner, Guido Sanguinetti, Felix Krueger, Wei Xie, Peter Rugg-Gunn, Gavin Kelsey, Wendy Dean, Jennifer Nichols, Oliver Stegle, John Marioni, Wolf Reik
Formation of the three primary germ layers during gastrulation is an essential step in the establishment of the vertebrate body plan. Recent studies employing single cell RNA-sequencing have identified major transcriptional changes associated with germ layer specification. Global epigenetic reprogramming accompanies these changes, but the role of the epigenome in regulating early cell fate choice remains unresolved, and the coordination between different epigenetic layers is unclear. Here we describe the first single cell triple-omics map of chromatin accessibility, DNA methylation and RNA expression during the exit from pluripotency and the onset of gastrulation in mouse embryos. We find dynamic dependencies between the different molecular layers, with evidence for distinct modes of epigenetic regulation. The initial exit from pluripotency coincides with the establishment of a global repressive epigenetic landscape, followed by the emergence of local lineage-specific epigenetic patterns during gastrulation. Notably, cells committed to mesoderm and endoderm undergo widespread coordinated epigenetic rearrangements, driven by loss of methylation in enhancer marks and a concomitant increase of chromatin accessibility. In striking contrast, the epigenetic landscape of ectodermal cells is already established in the early epiblast. Hence, regulatory elements associated with each germ layer are either epigenetically primed or epigenetically remodelled prior to overt cell fate decisions during gastrulation, providing the molecular logic for a hierarchical emergence of the primary germ layers.
5,512 downloads cancer biology
U1 small nuclear RNA (U1 snRNA), as one of the most abundant noncoding RNA in eukaryotic cells plays an important role in splicing of pre-mRNAs. Compared to other studies which have focused on the primary function of U1 snRNA and the neurodegenerative diseases caused by the abnormalities of U1 snRNA, this study is to investigate how the U1 snRNA over-expression affects the expression of genes on a genome-wide scale. In this study, we built a model of U1 snRNA over-expression in a rat cell line. By comparing the gene expression profiles of U1 snRNA over-expressed cells with those of their controls using the microarray experiments, 916 genes or loci were identified significantly differentially expressed. These 595 up-regulated genes and 321 down-regulated genes were further analyzed using the annotations from the GO terms and the KEGG database. As a result, three of 12 enriched pathways are well-known cancer pathways, while nine of them were associated to cancers in previous studies. The further analysis of 73 genes involved in 12 pathways suggests that U1 snRNA regulates cancer gene expression. The microarray data with ID GSE84304 is available in the NCBI GEO database.
5,507 downloads ecology
While glacier ice cores provide climate information over tens to hundreds of thousands of years, study of microbes is challenged by ultra-low-biomass conditions, and virtually nothing is known about co-occurring viruses. Here we establish ultra-clean microbial and viral sampling procedures and apply them to two ice cores from the Guliya ice cap (northwestern Tibetan Plateau, China) to study these archived communities. This method reduced intentionally contaminating bacterial, viral, and free DNA to background levels in artificial-ice-core control experiments, and was then applied to two authentic ice cores to profile their microbes and viruses. The microbes differed significantly across the two ice cores, presumably representing the very different climate conditions at the time of deposition that is similar to findings in other cores. Separately, viral particle enrichment and ultra-low-input quantitative viral metagenomic sequencing from ~520 and ~15,000 years old ice revealed 33 viral populations (i.e., species-level designations) that represented four known genera and likely 28 novel viral genera (assessed by gene-sharing networks). In silico host predictions linked 18 of the 33 viral populations to co-occurring abundant bacteria, including Methylobacterium, Sphingomonas, and Janthinobacterium, indicating that viruses infected several abundant microbial groups. Depth-specific viral communities were observed, presumably reflecting differences in the environmental conditions among the ice samples at the time of deposition. Together, these experiments establish a clean procedure for studying microbial and viral communities in low-biomass glacier ice and provide baseline information for glacier viruses, some of which appear to be associated with the dominant microbes in these ecosystems.
5,506 downloads bioinformatics
We show that deep convolutional neural networks combined with non-linear dimension reduction enable reconstructing biological processes based on raw image data. We demonstrate this by reconstructing the cell cycle of Jurkat cells and disease progression in diabetic retinopathy. In further analysis of Jurkat cells, we detect and separate a subpopulation of dead cells in an unsupervised manner and, in classifying discrete cell cycle stages, we reach a 6-fold reduction in error rate compared to a recent approach based on boosting on image features. In contrast to previous methods, deep learning based predictions are fast enough for on-the-fly analysis in an imaging flow cytometer.
5,506 downloads genomics
Reference genome projects have historically selected inbred individuals to minimize heterozygosity and simplify assembly. We challenge this dogma and present a new approach designed specifically for heterozygous genomes. "Trio binning" uses short reads from two parental genomes to partition long reads from an offspring into haplotype-specific sets prior to assembly. Each haplotype is then assembled independently, resulting in a complete diploid reconstruction. On a benchmark human trio, this method achieved high accuracy and recovered complex structural variants missed by alternative approaches. To demonstrate its effectiveness on a heterozygous genome, we sequenced an F1 cross between cattle subspecies Bos taurus taurus and Bos taurus indicus, and completely assembled both parental haplotypes with NG50 haplotig sizes >20 Mbp and 99.998% accuracy, surpassing the quality of current cattle reference genomes. We propose trio binning as a new best practice for diploid genome assembly that will enable new studies of haplotype variation and inheritance.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!