Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 57,773 bioRxiv papers from 265,845 authors.
Most downloaded bioRxiv papers, since beginning of last month
56,319 results found. For more information, click each entry to expand.
343 downloads genetics
Imprinted genes are expressed from a single parental allele. In mammals, this unusual mode of transcription generally depends on the epigenetic silencing of one allele by DNA methylation (DNAme) established in the germline. While many species-specific imprinted orthologues have been documented in eutherians, the molecular mechanisms underlying the evolutionary switch from biallelic to imprinted expression are currently unknown. During mouse oogenesis, gametic differentially methylated regions (gDMRs) acquire DNAme in a process guided by transcription. Here we show that transcription initiating in proximal lineage-specific endogenous retroviruses (ERVs) is likely responsible for DNAme established in oocytes at 4/6 mouse-specific and 17/110 human-specific maternal imprinted gDMRs (igDMRs). The latter can be further divided into Catarrhini (Old World monkeys and apes)- or Hominoidea (ape)-specific igDMRs, which are embedded within transcription units initiating in ERVs specific to these primate lineages. Using CRISPR-Cas9 mutagenesis, we deleted the relevant murine-specific ERVs upstream of the maternally methylated genes Impact and Slc38a4. Strikingly, imprinting at these genes was lost in the offspring of females harboring these deletions and biallelic expression was observed. Our work reveals a novel evolutionary mechanism whereby maternally silenced genes arise from biallelically expressed progenitors.
341 downloads bioinformatics
Despite the success and fast adaptation of deep learning models in a wide range of fields, lack of interpretability remains an issue, especially in biomedical domains. A recent promising method to address this limitation is Integrated Gradients (IG), which identifies features associated with a prediction by traversing a linear path from a baseline to a sample. We extend IG with nonlinear paths, embedding in latent space, alternative baselines, and a framework to identify important features which make it suitable for interpretation of deep models for genomics.
341 downloads neuroscience
The cortex sends a direct projection to the superior colliculus. What is largely unknown is whether (and if so how) the superior colliculus modulates activity in the cortex. Here, we directly investigate this issue, showing that optogenetic activation of superior colliculus changes the input-output relationship of neurons in somatosensory cortex during whisker movement, enhancing responses to low amplitude whisker deflections. While there is no direct pathway from superior colliculus to somatosensory cortex, we found that activation of superior colliculus drives spiking in the posterior medial (POm) nucleus of the thalamus via a powerful monosynaptic pathway. Furthermore, POm neurons receiving input from superior colliculus provide excitatory input to somatosensory cortex. Silencing POm abolished the capacity of superior colliculus to modulate cortical whisker responses. Our findings indicate that the superior colliculus, which plays a key role in attention, modulates sensory processing in somatosensory cortex via a powerful disynaptic pathway through the thalamus.
341 downloads bioinformatics
Background: The use of deep learning in analyses of DNA methylation data is beginning to emerge and distill non-linear relationships among high-dimensional data features. However, a generalized and user-friendly approach for execution, training, and interpreting deep learning models for methylation data is lacking. Results: We introduce and demonstrate the robust performance of MethylNet on downstream tasks of DNA methylation analysis, including cell-type deconvolution, pan-cancer classification, and subject age prediction. We interrogate the learned features from a pan-cancer classification to show high fidelity clustering of cancer subtypes, and compare the importance assigned to CpGs for the age and cell-type analyses to demonstrate concordance with expected biology. Conclusions: Our findings demonstrate high accuracy of end-to-end deep learning methods on methylation prediction tasks. Together, our results highlight the promise of future steps to use transfer learning, hyperparameter optimization and feature interpretations on DNA methylation data.
341 downloads bioinformatics
Single cell RNA-seq (scRNA-seq) has become the method of choice for analyzing mRNA distributions in heterogeneous cell populations. scRNA-seq only partially samples the cells in a tissue and the RNA in each cell, resulting in sparse data that challenge analysis. We develop a methodology that addresses scRNA-seq's sparsity through partitioning the data into metacells: disjoint, homogenous and highly compact groups of cells, each exhibiting only sampling variance. Metacells constitute local building blocks for clustering and quantitative analysis of gene expression, while not enforcing any global structure on the data, thereby maintaining statistical control and minimizing biases. We illustrate the MetaCell framework by re-analyzing cell type and transcriptional gradients in peripheral blood and whole organism scRNA-seq maps. Our algorithms are implemented in the new MetaCell R/C++ software package.
341 downloads neuroscience
Marina E Garrett, Sahar Manavi, Kate Roll, Douglas R Ollerenshaw, Peter A. Groblewski, Justin Kiggins, Xiaoxuan Jia, Linzy Casal, Kyla Mace, Ali Williford, Arielle Leon, Stefan Mihalas, Shawn R. Olsen
Cortical circuits are flexible and can change with experience and learning. However, the effects of experience on specific cell types, including distinct inhibitory types, are not well understood. Here we investigated how excitatory and VIP inhibitory cells in layer 2/3 of mouse visual cortex were impacted by visual experience in the context of a behavioral task. Mice learned to perform an image change detection task with a set of eight natural scene images, viewing these images thousands of times. Subsequently, during 2-photon imaging experiments, mice performed the task with these familiar images and three additional sets of novel images. Novel images evoked stronger overall activity in both excitatory and VIP populations, and familiar images were more sparsely coded by excitatory cells. The temporal dynamics of VIP activity differed markedly between novel and familiar images: VIP cells were stimulus-driven by novel images but displayed ramping activity during the inter-stimulus interval for familiar images. Moreover, when a familiar stimulus was omitted from an expected sequence, VIP cells showed extended ramping activity until the subsequent image presentation. This prominent shift in response dynamics suggests that VIP cells may adopt different modes of processing during familiar versus novel conditions.
340 downloads systems biology
Standardization of data and models facilitates effective communication, especially in computational systems biology. However, both the development and consistent use of standards and resources remains challenging. As a result, the amount, quality, and format of the information contained within systems biology models are not consistent and therefore present challenges for widespread use and communication. Here, we focused on these standards, resources, and challenges in the field of metabolic modeling by conducting a community-wide survey. We used this feedback to (1) outline the major challenges that our field faces and to propose solutions and (2) identify a set of features that defines what a "gold standard" metabolic network reconstruction looks like concerning content, annotation, and simulation capabilities. We anticipate that this community-driven outline will help the long-term development of community-inspired resources as well as produce high-quality, accessible models. More broadly, we hope that these efforts can serve as blueprints for other computational modeling communities to ensure continued development of both practical, usable standards and reproducible, knowledge-rich models.
340 downloads genomics
For half a century population genetics studies have put type II restriction endonucleases to work. Now, coupled with massively-parallel, short-read sequencing, the family of RAD protocols that wields these enzymes has generated vast genetic knowledge from the natural world. Here we describe the first software capable of using paired-end sequencing to derive short contigs from de novo RAD data natively. Stacks version 2 employs a de Bruijn graph assembler to build contigs from paired-end reads and overlap those contigs with the corresponding single-end loci. The new architecture allows all the individuals in a meta population to be considered at the same time as each RAD locus is processed. This enables a Bayesian genotype caller to provide precise SNPs, and a robust algorithm to phase those SNPs into long haplotypes -- generating RAD loci that are 400-800bp in length. To prove its recall and precision, we test the software with simulated data and compare reference-aligned and de novo analyses of three empirical datasets. We show that the latest version of Stacks is highly accurate and outperforms other software in assembling and genotyping paired-end de novo datasets.
340 downloads immunology
David Schafflick, Chenling Xu, Maike Hartlehnert, Michael Cole, Tobias Lautwein, Andreas Schulte-Mecklenbeck, Jolien Wolbert, Michael O Heming, Sven G. Meuth, Tanja Kuhlmann, Catharina C Gross, Heinz Wiendl, Nir Yosef, Gerd Meyer zu Horste
Cerebrospinal fluid (CSF) protects the central nervous system (CNS) and analyzing CSF aids the diagnosis of CNS diseases, but our understanding of CSF leukocytes remains superficial. Here, we firstly provide a transcriptional map of single leukocytes in CSF compared to blood. Leukocyte composition and transcriptome were compartment-specific with CSF-enrichment of myeloid dendritic cells and a border-associated phenotype of monocytes. We secondly tested how multiple sclerosis (MS) - an autoimmune disease of the CNS - affected both compartments. MS increased transcriptional diversity in blood, while it preferentially increased cell type diversity in CSF. In addition to the known expansion of B lineage cells, we identified an increase of cytotoxic-phenotype and follicular T helper (TFH) cells in the CSF. In mice, TFH cells accordingly promoted B cell infiltration into the CNS and severity of MS animal models. Immune mechanisms in MS are thus highly compartmentalized and indicate local T/B cell interaction.
339 downloads cancer biology
Korsuk Sirinukunwattana, Enric Domingo, Susan Richman, Keara L Redmond, Andrew Blake, Clare Verrill, Simon J Leedham, Aikaterini Chatzipli, Claire Hardy, Celina Whalley, Chieh-Hsi Wu, Andrew D Beggs, Ultan McDermott, Philip Dunne, Angela A Meade, Steven M Walker, Graeme I Murray, Leslie M Samuel, Matthew Seymour, Ian Tomlinson, Philip Quirke, Tim Maughan, Jens Rittscher, Viktor Koelzer
Image analysis is a cost-effective tool to associate complex features of tissue organisation with molecular and outcome data. Here we predict consensus molecular subtypes (CMS) of colorectal cancer (CRC) from standard H&E sections using deep learning. Domain adversarial training of a neural classification network was performed using 1,553 tissue sections with comprehensive multi-omic data from three independent datasets. Image-based consensus molecular subtyping (imCMS) accurately classified CRC whole-slide images and preoperative biopsies, spatially resolved intratumoural heterogeneity and provided accurate secondary calls with higher discriminatory power than bioinformatic prediction. In all three cohorts imCMS established sensible classification in CMS unclassified samples, reproduced expected correlations with (epi)genomic alterations and effectively stratified patients into prognostic subgroups. Leveraging artificial intelligence for the development of novel biomarkers extracted from histological slides with molecular and biological interpretability has remarkable potential for clinical translation.
339 downloads genomics
Chromatin folding below the scale of topologically associating domains (TADs) remains largely unexplored in mammals. Here, we used a high-resolution 3C-based method, Micro-C, to probe links between 3D-genome organization and transcriptional regulation in mouse stem cells. Combinatorial binding of transcription factors, cofactors, and chromatin modifiers spatially segregate TAD regions into "microTADs" with distinct regulatory features. Enhancer-promoter and promoter-promoter interactions extending from the edge of these domains predominantly link co-regulated loci, often independently of CTCF/Cohesin. Acute inhibition of transcription disrupts the gene-related folding features without altering higher-order chromatin structures. Intriguingly, we detect "two-start" zig-zag 30-nanometer chromatin fibers. Our work uncovers the finer-scale genome organization that establishes novel functional links between chromatin folding and gene regulation.
338 downloads bioinformatics
In single-cell RNA-seq (scRNA-seq) experiments, the number of individual cells has increased exponentially, and the sequencing depth of each cell has decreased significantly. As a result, analyzing scRNA-seq data requires extensive considerations of program efficiency and method selection. In order to reduce the complexity of scRNA-seq data analysis, we present scedar, a scalable Python package for scRNA-seq exploratory data analysis. The package provides a convenient and reliable interface for performing visualization, imputation of gene dropouts, detection of rare transcriptomic profiles, and clustering on large-scale scRNA-seq datasets. The analytical methods are efficient, and they also do not assume that the data follow certain statistical distributions. The package is extensible and modular, which would facilitate further development of functionalities for future requirements. The open source package is distributed under the terms of MIT license at https://pypi.org/project/scedar.
338 downloads bioinformatics
Drug Discovery is a lengthy and costly process and has faced a period of declining productivity within the last two decades. As a consequence, integrative data-driven approaches are nowadays on the rise in pharmaceutical research, making use of an inter-connected (network) view on diseases. In addition, evidence-based decisions are alleviated by studying the time evolution of innovation trends in drug discovery. In this paper a new approach leveraging data mining and data integration for inspecting target innovation trends protein family-wise is presented. The study highlights protein families which are receiving emerging interest in the drug discovery community (mainly kinases and G protein coupled receptors) and those with areas of interest in target space that have just emerged in the scientific literature (mainly kinases and transporters) highlighting novel opportunities for drug intervention. In order to delineate the evolution of target-driven research interest from a biological perspective, trends in biological process annotations from Gene Ontology (GO) and disease annotations from DisGeNet for major target families are captured. The analysis reveals an increasing interest in targets related to immune system processes, and a recurrent trend for targets involved in circulatory system processes. At the level of disease annotations, targets associated to e.g., cancer-related pathologies as well as to intellectual disability and schizophrenia are increasingly investigated nowadays. Can this knowledge be used to study the “movement of targets” in a network view and unravel new links between diseases and biological processes? We tackled this question by creating dynamic network representations considering data from different time periods. The dynamic network for immune system process-associated targets suggest that e.g. breast cancer as well as schizophrenia are linked to the same targets (cannabinoid receptor CB2 and VEGFR2) thus suggesting similar treatment options which could be confirmed by literature search. The methodology has the potential to identify other drug repurposing candidates and enables researchers to capture trends in research attention in target space at an early stage. The KNIME workflows and R scripts used in this study are publicly available from https://github.com/BZdrazil/Moving_Targets.
338 downloads plant biology
Amy Watson, Sreya Ghosh, Matthew J Williams, William S. Cuddy, James Simmonds, María-Dolores Rey, M. Asyraf Md. Hatta, Alison Hinchliffe, Andrew Steed, Daniel Reynolds, Nikolai Adamski, Andy Breakspear, Andrey Korolev, Tracey Rayner, Laura E. Dixon, Adnan Riaz, William Martin, Merrill Ryan, David Edwards, Jacqueline Batley, Harsh Raman, Christian Rogers, Claire Domoney, Graham Moore, Wendy Harwood, Paul Nicholson, Mark J. Dieters, Ian H. DeLacy, Ji Zhou, Cristobal Uauy, Scott A. Boden, Robert F. Park, Brande B. H. Wulff, Lee T. Hickey
The growing human population and a changing environment have raised significant concern for global food security, with the current improvement rate of several important crops inadequate to meet future demand . This slow improvement rate is attributed partly to the long generation times of crop plants. Here we present a method called 'speed breeding', which greatly shortens generation time and accelerates breeding and research programs. Speed breeding can be used to achieve up to 6 generations per year for spring wheat (Triticum aestivum), durum wheat (T. durum), barley (Hordeum vulgare), chickpea (Cicer arietinum), and pea (Pisum sativum) and 4 generations for canola (Brassica napus), instead of 2-3 under normal glasshouse conditions. We demonstrate that speed breeding in fully-enclosed controlled-environment growth chambers can accelerate plant development for research purposes, including phenotyping of adult plant traits, mutant studies, and transformation. The use of supplemental lighting in a glasshouse environment allows rapid generation cycling through single seed descent and potential for adaptation to larger-scale crop improvement programs. Cost-saving through LED supplemental lighting is also outlined. We envisage great potential for integrating speed breeding with other modern crop breeding technologies, including high-throughput genotyping, genome editing, and genomic selection, accelerating the rate of crop improvement.
337 downloads ecology
Remote sensing can transform the speed, scale, and cost of biodiversity and forestry surveys. Data acquisition currently outpaces the ability to identify individual organisms in high resolution imagery. We outline an approach for identifying tree-crowns in true color, or red/green blue (RGB) imagery using a deep learning detection network. Individual crown delineation is a persistent challenge in studies of forested ecosystems and has primarily been addressed using three-dimensional LIDAR. We show that deep learning models can leverage existing lidar-based unsupervised delineation approaches to initially train an RGB crown detection model, which is then refined using a small number of hand-annotated RGB images. We validate our proposed approach using an open-canopy site in the National Ecological Observation Network (NEON). Our results show that combining LIDAR and RGB methods in a self-supervised model improves predictions of trees in natural landscapes. The addition of a small number of hand-annotated trees improved performance over the initial self-supervised model. While undercounting of individual trees in complex canopies remains an area of development, deep learning can increase the performance of remotely sensed tree surveys.
337 downloads systems biology
Microscopy image analysis is a major bottleneck in quantification of single-cell microscopy data, typically requiring human supervision and curation, which limit both accuracy and throughput. To address this, we developed a deep learning-based image analysis pipeline that performs segmentation, tracking, and lineage reconstruction. Our analysis focuses on time-lapse movies of Escherichia coli cells trapped in a "mother machine" microfluidic device, a scalable platform for long-term single-cell analysis that is widely used in the field. While deep learning has been applied to cell segmentation problems before, our approach is fundamentally innovative in that it also uses machine learning to perform cell tracking and lineage reconstruction. With this framework we are able to get high fidelity results (1% error rate), without human supervision. Further, the algorithm is fast, with complete analysis of a typical frame containing ~150 cells taking <700msec. The framework is not constrained to a particular experimental set up and has the potential to generalize to time-lapse images of other organisms or different experimental configurations. These advances open the door to a myriad of applications including real-time tracking of gene expression and high throughput analysis of strain libraries at single-cell resolution.
336 downloads microbiology
Rather than acting as rigid symmetrical shells to protect and transmit their genomes, the capsids of non-enveloped, icosahedral viruses co-ordinate multiple, essential processes during the viral life-cycle, and undergo extensive conformational rearrangements to deliver these functions. Capturing conformational flexibility has been challenging, yet could be key in understanding and combating infections that viruses cause. Noroviruses are non-enveloped, icosahedral viruses of global importance to human health. They are a common cause of acute non-bacterial gastroenteritis, yet no vaccines or antiviral agents specific to norovirus are available. Here, we use cryo-electron microscopy to study the high-resolution solution structures of infectious, inactivated and mutant virions of murine norovirus (MNV) as a model for human noroviruses. Together with genetic studies, we show that the viral capsid is highly dynamic. While there is little change to the shell domain of the capsid, the protruding domains that radiate from this are flexible and adopt distinct states both independently and synchronously. In doing so the viral capsid is able to sample a defined range of conformational space, with implications for the maintenance of virion stability and infectivity. These data will aid in developing the first generation of effective control measures against this virus.
336 downloads bioinformatics
Background: Enhancers play a fundamental role in orchestrating cell state and development. Although several methods have been developed to identify enhancers, linking them to their target genes is still an open problem. Several theories have been proposed on the functional mechanisms of enhancers, which triggered the development of various methods to infer promoter enhancer interactions (PEIs). The advancement of high-throughput techniques describing the three-dimensional organisation of the chromatin, paved the way to pinpoint long-range PEIs. Here we investigated whether including PEIs in computational models for the prediction of gene expression improves performance and interpretability. Results: We have extended our Tepic framework to include DNA contacts deduced from chromatin conformation capture experiments and compared various methods to determine PEIs using predictive modelling of gene expression from chromatin accessibility data and predicted transcription factor (TF) motif data. We found that including long-range PEIs deduced from both HiC and HiChIP data indeed improves model performance. We designed a novel machine learning approach that allows to prioritize TFs in distal loop and promoter regions with respect to their importance for gene expression regulation. Our analysis revealed a set of core TFs that are part of enhancer-promoter loops involving YY1 in different cell lines. Conclusion: We show that the integration of chromatin conformation data improves gene expression prediction, underlining the importance of enhancer looping for gene expression regulation. Our general approach can be used to prioritize TFs that are involved in distal and promoter-proximal regulation using accessibility, conformation and expression data.
336 downloads neuroscience
The development of new imaging and optogenetics techniques to study the dynamics of large neuronal circuits is generating datasets of unprecedented volume and complexity, demanding the development of appropriate analysis tools. We present a tutorial for the use of a comprehensive computational toolbox for the analysis of neuronal population activity imaging. It consists of tools for image pre-processing and segmentation, estimation of significant single-neuron single-trial signals, mapping event-related neuronal responses, detection of activity-correlated neuronal clusters, exploration of population dynamics, and analysis of clusters' features against surrogate control datasets. They are integrated in a modular and versatile processing pipeline, adaptable to different needs. The clustering module is capable of detecting flexible, dynamically activated neuronal assemblies, consistent with the distributed population coding of the brain. We demonstrate the suitability of the toolbox for a variety of calcium imaging datasets, and provide a case study to explain its implementation.
335 downloads biophysics
The acquisition of cryo-electron microscopy (cryo-EM) data from biological specimens is currently largely uncoupled from subsequent data evaluation, correction and processing. Therefore, the acquisition strategy is difficult to optimize during data collection, often leading to suboptimal microscope usage and disappointing results. Here we provide Warp, a software for real-time evaluation, correction, and processing of cryo-EM data during their acquisition. Warp evaluates and monitors key parameters for each recorded micrograph or tomographic tilt series in real time. Warp also rapidly corrects micrographs for global and local motion, and estimates the local defocus with the use of novel algorithms. The software further includes a deep learning-based particle picking algorithm that rivals human accuracy to make the pre-processing pipeline truly automated. The output from Warp can be directly fed into established tools for particle classification and 3D image reconstruction. In a benchmarking study we show that Warp automatically processed a published cryo-EM data set for influenza virus hemagglutinin, leading to an improvement of the nominal resolution from 3.9 Å to 3.2 Å. Warp is easy to install, computationally inexpensive, and has an intuitive and streamlined user interface.
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!