Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 62,232 bioRxiv papers from 276,305 authors.
Most downloaded bioRxiv papers, since beginning of last month
49,504 results found. For more information, click each entry to expand.
490 downloads biochemistry
Proteome characterization relies heavily on tandem mass spectrometry (MS/MS) and is thus associated with instrumentation complexity, lengthy analysis time, and limited duty-cycle. It was always tempting to implement approaches which do not require MS/MS, yet, they were constantly failing in achieving meaningful depth of quantitative proteome coverage within short experimental times, which is particular important for clinical or biomarker discovery applications. Here, we report on the first successful attempt to develop a truly MS/MS-free and label-free method for bottom-up proteomics. We demonstrate identification of 1000 protein groups for a standard HeLa cell line digest using 5-minute LC gradients. The amount of loaded sample was varied in a range from 1 ng to 500 ng, and the method demonstrated 10-fold higher sensitivity compared with the standard MS/MS-based approach. Due to significantly higher sequence coverage obtained by the developed method, it outperforms all popular MS/MS-based label-free quantitation approaches.
485 downloads bioinformatics
We introduce Salmon, a new method for quantifying transcript abundance from RNA-seq reads that is highly-accurate and very fast. Salmon is the first transcriptome-wide quantifier to model and correct for fragment GC content bias, which we demonstrate substantially improves the accuracy of abundance estimates and the reliability of subsequent differential expression analysis compared to existing methods that do not account for these biases. Salmon achieves its speed and accuracy by combining a new dual-phase parallel inference algorithm and feature-rich bias models with an ultra-fast read mapping procedure. These innovations yield both exceptional accuracy and order-of-magnitude speed benefits over alignment-based methods.
484 downloads evolutionary biology
An organism Tree of Life (organism ToL) is a conceptual and metaphorical tree to capture a simplified narrative of the evolutionary course and kinship among the extant organisms of today. Such tree cannot be experimentally validated, but may be reconstructed based on characteristics associated with the organisms. Since the whole genome sequence of an organism is, at present, the most comprehensive descriptor of the organism, a genome Tol can be an empirically derivable surrogate for the organism ToL. However, a genome ToL has been impossible to construct because of the practical reasons that experimentally determining the whole genome sequences of a large number of diverse organisms was technically impossible. Thus, for several decades gene ToLs based on selected genes have been commonly used as a surrogate for the organisms ToL. This situation changed dramatically during the last several decades due to rapid advances in DNA sequencing technology. Here we present a genome ToL, which shows that (a) whole genome information can be compared without sequence alignment, (b) all extant organisms can be classified into six large groups and (c) all the founders of the groups have emerged in a Deep Burst at the very beginning period of the emergence of the Life on Earth.
483 downloads genomics
In order to identify the molecular determinants of human diseases, such as cancer, that arise from a diverse range of tissue, it is necessary to accurately distinguish normal and pathogenic cellular programs. Here we present a novel approach for single-cell multi-omic deconvolution of healthy and pathological molecular signatures within phenotypically heterogeneous malignant cells. By first creating immunophenotypic, transcriptomic and epigenetic single-cell maps of hematopoietic development from healthy peripheral blood and bone marrow mononuclear cells, we identify cancer-specific transcriptional and chromatin signatures from single cells in a cohort of mixed phenotype acute leukemia (MPAL) clinical samples. MPALs are a high-risk subtype of acute leukemia characterized by a heterogeneous malignant cell population expressing both myeloid and lymphoid lineage-specific markers. Our results reveal widespread heterogeneity in the pathogenetic gene regulatory and expression programs across patients, yet relatively consistent changes within patients even across malignant cells occupying diverse portions of the hematopoietic lineage. An integrative analysis of transcriptomic and epigenetic maps identifies 91,601 putative gene-regulatory interactions and classifies a number of transcription factors that regulate leukemia specific genes, including RUNX1-linked regulatory elements proximal to CD69. This work provides a template for integrative, multi-omic analysis for the interpretation of pathogenic molecular signatures in the context of developmental origin.
482 downloads molecular biology
High-throughput amplicon sequencing of large genomic regions represents a challenge for existing short-read technologies. Long-read technologies can in theory sequence large genomic regions, but they currently suffer from high error rates. Here, we report a high-throughput amplicon sequencing approach that combines unique molecular identifiers (UMIs) with Oxford Nanopore sequencing to generate single-molecule consensus sequences of large genomic regions. We demonstrate the approach by generating nearly 10,000 full-length ribosomal RNA (rRNA) operons of roughly 4,400 bp in length from a mock microbial community consisting of eight bacterial species using a single Oxford Nanopore MinION flowcell. The mean error rate of the consensus sequences was 0.03%, with no detectable chimeras due to a rigorous UMI-barcode filtering strategy. The simplicity and accessibility of this method paves way for widespread use of high-accuracy amplicon sequencing in a variety of genomic applications.
481 downloads biophysics
The ClpXP degradation machine consists of a hexameric AAA+ unfoldase (ClpX) and a pair of heptameric serine protease rings (ClpP) that unfold, translocate, and subsequently degrade client proteins. ClpXP is an important target for drug development against infectious diseases. Although structures are available for isolated ClpX and ClpP rings, it remains unknown how symmetry mismatched ClpX and ClpP work in tandem for processive substrate translocation into the ClpP proteolytic chamber. Here we present cryo-EM structures of the substrate-bound ClpXP complex from Neisseria meningitidis at 2.3 to 3.3 Å resolution. The structures allow development of a model in which the cyclical hydrolysis of ATP is coupled to concerted motions of ClpX loops that lead to directional substrate translocation and ClpX rotation relative to ClpP. Our data add to the growing body of evidence that AAA+ molecular machines generate translocating forces by a common mechanism.
478 downloads cell biology
Gordana Wutz, Brian Tyler Glenn St. Hilaire, Rene Ladurner, Roman Stocsits, Kota Nagasaka, Benoit Pignard, Adrian Sanborn, Wen Tang, Csilla Varnai, Miroslav Ivanov, Stefan Schoenfelder, Petra van der Lelij, Xingfan Huang, Gerhard Duernberger, Elisabeth Roitinger, Karl Mechtler, Iain Finley Davidson, Peter Fraser, Erez Lieberman Aiden, Jan-Michael Peters
Eukaryotic genomes are folded into loops. It is thought that these are formed by cohesin complexes via extrusion, either until loop expansion is arrested by CTCF or until cohesin is removed from DNA by WAPL. Although WAPL limits cohesin chromatin residence time to minutes, it has been reported that some loops exist for hours. How these loops can persist is unknown. We show that during G1-phase, mammalian cells contain acetylated cohesinSTAG1 which binds chromatin for hours, whereas cohesinSTAG2 binds chromatin for minutes. Our results indicate that CTCF and the acetyltransferase ESCO1 protect a subset of cohesinSTAG1 complexes from WAPL, thereby enable formation of long and presumably long-lived loops, and that ESCO1, like CTCF, contributes to boundary formation in chromatin looping. Our data are consistent with a model of nested loop extrusion, in which acetylated cohesinSTAG1 forms stable loops between CTCF sites, demarcating the boundaries of more transient cohesinSTAG2 extrusion activity.
477 downloads developmental biology
The observation of individuals attaining remarkable ages, and their concentration into geographic sub-regions or 'blue zones', has generated considerable scientific interest. Proposed drivers of remarkable longevity include high vegetable intake, strong social connections, and genetic markers. Here, we reveal new predictors of remarkable longevity and 'supercentenarian' status. In the United States, supercentenarian status is predicted by the absence of vital registration. The state-specific introduction of birth certificates is associated with a 69-82% fall in the number of supercentenarian records. In Italy, which has more uniform vital registration, remarkable longevity is instead predicted by low per capita incomes and a short life expectancy. Finally, the designated 'blue zones' of Sardinia, Okinawa, and Ikaria corresponded to regions with low incomes, low literacy, high crime rate and short life expectancy relative to their national average. As such, relative poverty and short lifespan constitute unexpected predictors of centenarian and supercentenarian status, and support a primary role of fraud and error in generating remarkable human age records.
476 downloads microbiology
David C. Danko, Daniela Bezdan, Ebrahim Afshinnekoo, Sofia Ahsanuddin, Chandrima Bhattacharya, Daniel J Butler, Kern Rei Chng, Francesca De Filippis, Jochen Hecht, Andre Kahles, Mikhail Karasikov, Nikos C. Kyrpides, Marcus H Y Leung, Dmitry Meleshko, Harun Mustafa, Beth Mutai, Russell Y Neches, Amanda Ng, Marina Nieto-Caballero, Olga Nikolayeva, Tatyana Nikolayeva, Eileen Png, Jorge L Sanchez, Heba Shaaban, Maria A Sierra, Xinzhao Tong, Ben Young, Josue Alicea, Malay Bhattacharyya, Ran Blekhman, Eduardo Castro-Nallar, Ana M Cañas, Aspassia D Chatziefthimiou, Robert W Crawford, Youping Deng, Christelle Desnues, Emmanuel Dias-Neto, Daisy Donnellan, Marius Dybwad, Eran Elhaik, Danilo Ercolini, Alina Frolova, Alexandra B Graf, David C Green, Iman Hajirasouliha, Mark Hernandez, Gregorio Iraola, Soojin Jang, Angela Jones, Frank J Kelly, Kaymisha Knights, Paweł P Łabaj, Patrick K. H. Lee, Levy Shawn, Per Ljungdahl, Abigail Lyons, Gabriella Mason-Buck, Ken McGrath, Emmanuel F Mongodin, Milton Ozorio Moraes, Niranjan Nagarajan, Houtan Noushmehr, Manuela Oliveira, Stephan Ossowski, Olayinka O Osuolale, Orhan Özcan, David Paez-Espino, Nicolas Rascovan, Hugues Richard, Gunnar Rätsch, Lynn M Schriml, Torsten Semmler, Osman U Sezerman, Leming Shi, Le Huu Song, Haruo Suzuki, Denise Syndercombe Court, Dominique Thomas, Scott W Tighe, Klas I Udekwu, Juan A. Ugalde, Brandon Valentine, Dimitar I Vassilev, Elena Vayndorf, Thirumalaisamy P. Velavan, María M Zambrano, Jifeng Zhu, Sibo Zhu, Christopher E Mason, The International MetaSUB Consortium
Although studies have shown that urban environments and mass-transit systems have distinct genetic profiles, there are no systematic studies of these dense, human/microbial ecosystems around the world. To address this gap in knowledge, we created a global metagenomic and antimicrobial resistance (AMR) atlas of urban mass transit systems from 58 cities, spanning 3,741 samples and 4,424 taxonomically-defined microorganisms collected for from 2015-2017. The map provides annotated, geospatial details about microbial strains, functional genetics, antimicrobial resistance, and novel genetic elements, including 10,928 novel predicted viral species. Urban microbiomes often resemble human commensal microbiomes from the skin and airways, but also contain a consistent “core” of 61 species which are predominantly not human commensal species. Conversely, samples may be accurately (91.4%) classified to their city-of-origin using a linear support vector machine over taxa. These data also show that AMR density across cities varies by several orders of magnitude, including many AMRs present on plasmids with specific cosmopolitan distributions. Together, these results constitute a high-resolution global metagenomic atlas, which enables the discovery of new genetic components of the built human environment, highlights potential forensic applications, and provides an essential first draft of the global AMR burden of the world’s cities.
473 downloads genomics
Aukje Marieke Oudelaar, Robert A Beagrie, Matthew Gosden, Sara De Ornellas, Emily Georgiades, Jon Kerry, Daniel Hidalgo, Joana Carrelha, Arun Shivalingam, Afaf H. El-Sagheer, Jelena M Telenius, Tom Brown, Veronica J Buckle, Merav Socolovsky, Douglas R Higgs, Jim R Hughes
Precise gene expression patterns during mammalian development are controlled by regulatory elements in the non-coding genome. Active enhancer elements interact with gene promoters within Topologically Associating Domains (TADs). However, the precise relationships between chromatin accessibility, nuclear architecture and gene activation are not completely understood. Here, we present Tiled-C, a new Chromosome Conformation Capture (3C) technology, which allows for the generation of high-resolution contact matrices of loci of interest at unprecedented depth, and which can be optimized for as few as 2,000 cells of input material. We have used this approach to study the chromatin architecture of the mouse α-globin locus through in vivo erythroid differentiation. Integrated analysis of matched chromatin accessibility and single-cell expression data shows that the α-globin locus lies within a pre-existing TAD, which is established prior to activation of the domain. During differentiation, this TAD undergoes further sub-compartmentalization as regulatory elements gradually become accessible and specific interactions between enhancers and promoters are formed. As these chromatin changes develop, gene expression is progressively upregulated. Our findings demonstrate that chromatin architecture and gene activation are tightly linked during development and provide insights into the distinct mechanisms contributing to the establishment of tissue-specific chromatin structures.
470 downloads neuroscience
To understand activity in the visual cortex, researchers typically investigate how parametric changes in stimuli affect neural activity. A fundamental tenet of this approach is that the response properties of neurons in one context, e.g. color stimuli, are representative of responses in other contexts, e.g. natural scenes. This assumption is not often tested. Here, for neurons in macaque area V4, we first estimated tuning curves for hue by presenting artificial stimuli of varying hue, and then tested whether these would correlate with hue tuning curves estimated from responses to natural images. We found that neurons' hue tuning on artificial stimuli was not representative of their hue tuning on natural images, even if the neurons were strongly color-responsive. One explanation of this result is that neurons in V4 respond to interactions between hue and other visual features. This finding exemplifies how tuning curves estimated by varying a small number of stimulus features can communicate a small and potentially unrepresentative slice of the neural response function.
469 downloads biophysics
Chromatin conformation regulates gene expression and thus constant remodeling of chromatin structure is essential to guarantee proper cell function. To gain insight into the spatio-temporal organization of the genome, we employ high-density photo-activated localization microscopy and deep learning to obtain temporally resolved super-resolution images of chromatin in vivo . In combination with high-resolution dense motion reconstruction, we confirm the existence of elongated ~ 45 to 90 nm wide chromatin ‘blobs’, which appear to be dynamically associating chromatin fragments in close physical and genomic proximity and adopt TAD-like interactions in the time-average limit. We found the chromatin structure exhibits a spatio-temporal correlation extending ~ 4 μm in space and tens of seconds in time, while chromatin dynamics are correlated over ~ 6 μm and outlast 40 s. Notably, chromatin structure and dynamics are closely interrelated, which may constitute a mechanism to grant access to regions with high local chromatin concentration.
467 downloads bioinformatics
Genome-wide association analyses have uncovered multiple genomic regions associated with T2D, but identification of the causal variants at these remains a challenge. There is growing interest in the potential of deep learning models - which predict epigenome features from DNA sequence - to support inference concerning the regulatory effects of disease-associated variants. Here, we evaluate the advantages of training convolutional neural network (CNN) models on a broad set of epigenomic features collected in a single disease-relevant tissue – pancreatic islets in the case of type 2 diabetes (T2D) - as opposed to models trained on multiple human tissues. We report convergence of CNN-based metrics of regulatory function with conventional approaches to variant prioritization – genetic fine-mapping and regulatory annotation enrichment. We demonstrate that CNN-based analyses can refine association signals at T2D-associated loci and provide experimental validation for one such signal. We anticipate that these approaches will become routine in downstream analyses of GWAS.
The bone marrow (BM) constitutes the primary site for life-long blood production and skeletal regeneration. However, its cellular composition and the spatial organization into distinct niches remains controversial. Here, we combine single-cell and spatially resolved transcriptomics to systematically map the molecular and cellular composition of the endosteal, sinusoidal, and arteriolar BM niches. This allowed us to transcriptionally profile all major BM resident cell types, determine their localization, and clarify the cellular and spatial sources of key growth factors and cytokines. Our data demonstrate that previously unrecognized Cxcl12-abundant reticular (CAR) cell subsets (i.e. Adipo- and Osteo- CAR cells) differentially localize to sinusoidal or arteriolar surfaces, locally act as professional cytokine secreting cells, and thereby establish distinct peri-vascular micro-niches. Importantly, we also demonstrate that the 3-dimensional organization of the BM can be accurately inferred from single-cell gene expression data using the newly developed RNA-Magnet algorithm. Together, our study reveals the cellular and spatial organization of BM niches, and offers a novel strategy to dissect the complex organization of whole organs in a systematic manner.
461 downloads neuroscience
As we navigate the world we learn about associations among events, extract relational structures, and store them in memory. This relational knowledge, in turn, enables generalization, inference, and hierarchical planning. Here we investigated relational knowledge during spatial navigation as multiscale predictive representations in the brain. We hypothesized that these representations are organized at multiple scales along posterior to anterior hierarchies in prefrontal and hippocampal regions. To test this, we conducted model based representational similarity analyses of neuroimaging data measured during virtual reality navigation of familiar and unfamiliar paths with realistically long distances. We tested the pattern similarity of each point, along each navigational path, to a weighted sum of its successor points within different temporal horizons. Predictive similarity was significantly higher for familiar paths. Overall, anterior PFC showed predictive horizons at the largest scales (~875m) and posterior hippocampus at the lowest (~25m), while the anterior hippocampus (~175m), prepolar PFC, and orbitofrontal regions (~350m) were in between. These findings support the idea that predictive representations are maintained at higher scales of abstraction in the anterior PFC, and unfolded at lower scales by prepolar PFC and hippocampal regions. This representational hierarchy can support generalization, hierarchical planning, and subgoals at multiple scales.
458 downloads neuroscience
Physical exercise seems universally beneficial to human and animal health, slowing cognitive aging and neurodegeneration. Cognitive benefits are tied to increased plasticity and reduced inflammation within the hippocampus, yet little is known about the factors and mechanisms mediating these effects. We discovered 'runner' plasma, collected from voluntarily running mice, infused into sedentary mice recapitulates the cellular and functional benefits of exercise on the brain. Importantly, runner plasma reduces baseline neuroinflammatory gene expression and prominently suppresses experimentally induced brain inflammation. Plasma proteomic analysis shows a striking increase in complement cascade inhibitors including clusterin, which is necessary for the anti-inflammatory effects of runner plasma. Cognitively impaired patients participating in structured exercise for 6 months showed higher plasma clusterin levels, which correlated positively with improvements in endurance and aerobic capacity. These findings demonstrate the existence of anti-inflammatory ′exercise factors′ that are transferrable, benefit the brain, and are present in humans engaging in exercise.
453 downloads neuroscience
The curse of dimensionality plagues models of reinforcement learning and decision-making. The process of abstraction solves this by constructing abstract variables describing features shared by different specific instances, reducing dimensionality and enabling generalization in novel situations. Here we characterized neural representations in monkeys performing a task where a hidden variable described the temporal statistics of stimulus-response-outcome mappings. Abstraction was defined operationally using the generalization performance of neural decoders across task conditions not used for training. This type of generalization requires a particular geometric format of neural representations. Neural ensembles in dorsolateral pre-frontal cortex, anterior cingulate cortex and hippocampus, and in simulated neural networks, simultaneously represented multiple hidden and explicit variables in a format reflecting abstraction. Task events engaging cognitive operations modulated this format. These findings elucidate how the brain and artificial systems represent abstract variables, variables critical for generalization that in turn confers cognitive flexibility.
448 downloads plant biology
Hirotaka Kato, Sumanth Mutte, Hidemasa Suzuki, Isidro Crespo, Shubhajit Das, Tatyana Radoeva, Mattia Fontana, Yoshihiro Yoshitake, Emi Hainiwa, Willy van den Berg, Simon Lindhoud, Johannes Hohlbein, Jan Willem Borst, Roeland Boer, Ryuichi Nishihama, Takayuki Kohchi, Dolf Weijers
Auxin controls numerous growth processes in land plants through a gene expression system that modulates ARF transcription factor activity. Gene duplications in families encoding auxin response components have generated tremendous complexity in most land plants, and neofunctionalization enabled various unique response outputs during development. However, it is unclear what fundamental biochemical principles underlie this complex response system. By studying the minimal system in Marchantia polymorpha, we derive an intuitive and simple model where a single auxin-dependent A-ARF activates gene expression. It is antagonized by an auxin-independent B-ARF that represses common target genes. Expression patterns of both ARF proteins define developmental zones where auxin response is permitted, quantitatively tuned, or prevented. This fundamental design likely represents the ancestral system, and formed the basis for inflated, complex systems.
445 downloads bioengineering
Although a wide variety of quantum computers are currently being developed, actual computational results have been largely restricted to contrived, artificial tasks. Finding ways to apply quantum computers to useful, real-world computational tasks remains an active research area. Here we describe our mapping of the protein design problem to the D-Wave quantum annealer. We present a system whereby Rosetta, a state-of-the-art protein design software suite, interfaces with the D-Wave quantum processing unit to find amino acid side chain identities and conformations to stabilize a fixed protein backbone. Our approach, which we call the QPacker, uses a large side-chain rotamer library and the full Rosetta energy function, and in no way reduces the design task to a simpler format. We demonstrate that quantum-annealer-based design can be applied to complex real-world design tasks, producing designed molecules comparable to those produced by widely adopted classical design approaches. We also show through large-scale classical folding simulations that the results produced on the quantum annealer can inform wet-lab experiments. For design tasks that scale exponentially on classical computers, the QPacker achieves nearly constant runtime performance, independent of the complexity of the task, up to the limits of the quantum computer's size.
443 downloads biophysics
Here, we describe the third major release of RELION. CPU-based vector acceleration has been added in addition to GPU support, which provides flexibility in use of resources and avoids memory limitations. Reference-free autopicking with Laplacian-of-Gaussian filtering and execution of jobs from python allows non-interactive processing during acquisition, including 2D-classification, de novo model generation and 3D-classification. Per-particle refinement of CTF parameters and correction of estimated beam tilt provides higher-resolution reconstructions when particles are at different heights in the ice, and/or coma-free alignment has not been optimal. Ewald sphere curvature correction improves resolution for large particles. We illustrate these developments with publicly available data sets: together with a Bayesian approach to beam-induced motion correction it leads to resolution improvements of 0.2-0.7 Å compared to previous RELION versions.
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!