Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 57,441 bioRxiv papers from 264,501 authors.
Most downloaded bioRxiv papers, since beginning of last month
56,052 results found. For more information, click each entry to expand.
1,219 downloads systems biology
Modern cytometry methods allow collecting complex, multi-dimensional data sets from heterogeneous cell populations at single-cell resolution. While methods exist to describe the progression and order of cellular processes from snapshots of such populations, these descriptions are limited to arbitrary pseudotime scales. Here we describe MAPiT, an universal transformation method that recovers real-time dynamics of cellular processes from pseudotime scales. As use cases, we applied MAPiT to two prominent problems in the flow-cytometric analysis of heterogeneous cell populations: (1) recovering the kinetics of cell cycle progression in unsynchronized and thus unperturbed cell populations, and (2) recovering the spatial arrangement of cells within multi-cellular spheroids prior to spheroid dissociation for cytometric analysis. Since MAPiT provides a theoretic basis for the relation of pseudotime values to real temporal and spatial scales, it can be used broadly in the analysis of cellular processes with snapshot data from heterogeneous cell populations.
1,208 downloads biochemistry
Using mRNA-Seq and de novo transcriptome assembly, we identified, cloned and characterized nine previously undiscovered fluorescent protein (FP) homologs from Aequorea victoria and a related Aequorea species, with most sequences highly divergent from avGFP. Among these FPs are the brightest GFP homolog yet characterized and a reversibly photochromic FP that responds to UV and blue light. Beyond green emitters, Aequorea species express purple- and blue-pigmented chromoproteins (CPs) with absorbances ranging from green to far-red, including two that are photoconvertible. X-ray crystallography revealed that Aequorea CPs contain a chemically novel chromophore with an unexpected crosslink to the main polypeptide chain. Because of the unique attributes of several of these newly discovered FPs, we expect that Aequorea will, once again, give rise to an entirely new generation of useful probes for bioimaging and biosensing.
1,207 downloads molecular biology
We previously described a novel alternative to Chromatin Immunoprecipitation, Cleavage Under Targets & Release Using Nuclease (CUT&RUN), in which unfixed permeabilized cells are incubated with antibody, followed by binding of a Protein A-Micrococcal Nuclease (pA/MNase) fusion protein (1). Upon activation of tethered MNase, the bound complex is excised and released into the supernatant for DNA extraction and sequencing. Here we introduce four enhancements to CUT&RUN: 1) a hybrid Protein A-Protein G-MNase construct that expands antibody compatibility and simplifies purification; 2) a modified digestion protocol that inhibits premature release of the nuclease-bound complex; 3) a calibration strategy based on carry-over of E. coli DNA introduced with the fusion protein; and 4) a novel peak-calling strategy customized for the low-background profiles obtained using CUT&RUN. These new features, coupled with the previously described low-cost, high efficiency, high reproducibility and high- throughput capability of CUT&RUN make it the method of choice for routine epigenomic profiling.
1,203 downloads genomics
Nina J Mars, Jukka T. Koskela, Pietari Ripatti, Tuomo T.J. Kiiskinen, Aki S Havulinna, Joni V. Lindbohm, Ari Ahola-Olli, Mitja Kurki, Juha Karjalainen, Priit Palta, FinnGen, Benjamin M Neale, Mark Daly, Veikko Salomaa, Aarno Palotie, Elisabeth Widen, Samuli Ripatti
Background: Polygenic risk scores (PRS) have shown promise in predicting susceptibility to common diseases. However, the extent to which PRS and clinical risk factors act jointly and identify high-risk individuals for early onset of disease is unknown. Methods: We used large-scale biobank data (the FinnGen study; n=135,300), with up to 46 years of prospective follow-up, and the FINRISK study with standardized clinical risk factor measurements to build genome-wide PRSs with >6M variants for coronary heart disease (CHD), type 2 diabetes (T2D), atrial fibrillation (AF), and breast and prostate cancer. We evaluated their associations with first disease events, age at disease onset, and impact together with routinely used clinical risk scores for predicting future disease. Results: Compared to the 20-80th percentiles, a PRS in the top 2.5% translated into hazard ratios (HRs) for incident disease ranging from 2.03 to 4.28 (p-values 1.96x10-59 to <1.00x10-100) and the bottom 2.5% into HRs ranging from 0.20 to 0.61. The estimated difference in age at disease onset between top and bottom 2.5% of PRSs was 6 to 13 years. Among early-onset cases, 21.3-32.9% had a PRS in the highest decile and in CHD and AF. Conclusions: The properties of PRS were similar in all five diseases. PRS identified a considerable proportion early-onset cases, and for all ages the performance of PRS was comparable to established clinical risk scores. These findings warrant further clinical studies on application of polygenic risk information for stratified screening or for guiding lifestyle and preventive medical interventions.
1,200 downloads developmental biology
One of the earliest and most significant events in embryonic development is zygotic genome activation (ZGA). In several species, bulk transcription begins at the mid-blastula transition (MBT) when, after a certain number of cleavages, the embryo attains a particular nuclear-to-cytoplasmic (N/C) ratio, maternal repressors become sufficiently diluted, and the cell cycle slows down. Here we resolve the frog ZGA in time and space by profiling RNA polymerase II (RNAPII) engagement and its transcriptional readout. We detect a gradual increase in both the quantity and the length of RNAPII elongation before the MBT, revealing that >1,000 zygotic genes disregard the N/C timer for their activation, and that the sizes of newly transcribed genes are not necessarily constrained by cell cycle duration. We also find that Wnt, Nodal and BMP signaling together generate most of the spatio-temporal dynamics of regional ZGA, directing the formation of orthogonal body axes and proportionate germ layers.
1,184 downloads neuroscience
Our understanding of the link between neural activity and perception remains incomplete. Microstimulation and optogenetic experiments have shown that manipulating cortical activity can influence sensory-guided behaviour or elicit artificial percepts. And yet, some perceptual tasks can still be solved when sensory cortex is silenced or removed, suggesting that cortical activity may not always be essential. Reconciling these findings, and providing a quantitative framework linking cortical activity and behaviour, requires knowledge of the identity of the cells being activated during the behaviour, the engagement of the local and downstream networks, and the cortical and behavioural state. Here, we performed two-photon population calcium imaging in L2/3 primary visual cortex (V1) of headfixed mice performing a visual detection task while simultaneously activating specific groups of neurons using targeted two-photon optogenetics during low contrast visual stimulation. Only activation of groups of cells with similar tuning to the relevant visual stimulus led to a measurable bias of detection behaviour. Targeted photostimulation revealed signatures of centre-surround, predominantly inhibitory and like-to-like connectivity motifs in the local network which shaped the visual stimulus representation and partially explained the change in stimulus detectability. Moreover, the behavioural effects depended on overall performance: when the task was challenging for the mouse, V1 activity was more closely linked to performance, and cortical stimulation boosted perception. In contrast, when the task was easy, V1 activity was less informative about performance and cortical stimulation suppressed stimulus detection. Altogether, we find that both the selective routing of information through functionally specific circuits, and the prevailing cortical state, make similarly large contributions to explaining the behavioural response to photostimulation. Our results thus help to reconcile contradictory findings about the involvement of primary sensory cortex in behavioural tasks, suggesting that the influence of cortical activity on behaviour is dynamically reassigned depending on the demands of the task.
1,181 downloads biophysics
HIV-1 Gag protein self-assembles at the plasma membrane of infected cells for viral particle formation. Gag targets lipids, mainly the phosphatidylinositol (4,5) bisphosphate, at the inner leaflet of this membrane. Here, we address the question whether Gag is able to trap specifically PI(4,5)P2 or other lipids during HIV-1 assembly in the host CD4+ T lymphocytes. Lipid dynamics within and away from HIV-1 assembly sites was determined using super-resolution STED microscopy coupled with scanning Fluorescence Correlation Spectroscopy in living T cells. Analysis of HIV-1 infected cells revealed that, upon assembly, HIV-1 is able to specifically trap PI(4,5)P2, and cholesterol, but not phosphatidylethanolamine or sphingomyelin. Furthermore, our data show that Gag is the main driving force to restrict PI(4,5)P2 and cholesterol mobility at the cell plasma membrane. This is first direct evidence showing that HIV-1 creates its own specific lipid environment by selectively recruiting PI(4,5)P2 and cholesterol, as a membrane nano-platform for virus assembly.
1,178 downloads developmental biology
Size trade-offs of visual versus olfactory organs is a pervasive feature of animal evolution. Comparing Drosophila species, we find that larger eyes correlate with smaller antennae, where olfactory organs reside, and narrower faces. We demonstrate that this trade-off arises through differential subdivision of the head primordium into visual versus non-visual fields. Specification of the visual field requires a highly-conserved eye development gene called eyeless in flies and Pax6 in humans. We discover that changes in the temporal regulation of eyeless expression during development is a conserved mechanism for sensory trade-offs within and between Drosophila species. We identify a natural single nucleotide polymorphism in the cis-regulatory region of eyeless that is sufficient to alter its temporal regulation and eye size. Because Pax6 is a conserved regulator of sensory placode subdivision, we propose that alterations in the mutual repression between sensory territories is a conserved mechanism for sensory trade-offs in animals.
1,161 downloads genomics
Jordan A Ramilowski, Chi Wai Yip, Saumya Agrawal, Jen-Chien Chang, Yari Ciani, Ivan V Kulakovskiy, Mickael Mendez, Jasmine Li Ching Ooi, Andreas Petri, Leonie Roos, Jessica Severin, Kayoko Yasuzawa, John F Ouyang, Nick Parkinson, Imad Abugessaisa, Altuna Akalin, Ivan Antonov, Erik Arner, Alessandro Bonetti, Hidemasa Bono, Beatrice Borsari, Frank Brombacher, Carlo Cannistraci, Christopher JF CAMERON, Ryan Cardenas, Melissa Cardon, Howard Chang, Josée Dostie, Luca Ducoli, Alexander Favorov, Alexandre Fort, Diego Garrido, Noa Gil, Juliette Gimenez, Reto Guler, Lusy Handoko, Jayson Harshbarger, Akira Hasegawa, Yuki Hasegawa, Kosuke Hashimoto, Norihito Hayatsu, Peter Heutink, Tetsuro Hirose, Eddie L. Imada, Masayoshi Itoh, Bogumil Kaczkowski, Aditi Kanhere, Emily Kawabata, Hideya Kawaji, Tsugumi Kawashima, Tom Kelly, Miki Kojima, Naoto Kondo, Haruhiko Koseki, Tsukasa Kouno, Anton Kratz, Mariola Kurowska-Stolarska, Andrew Tae-Jun Kwon, Jeffrey Leek, Andreas Lennartsson, Marina Lizio, Fernando Lopez, Joachim Luginbuehl, Shiori Maeda, Vsevolod Makeev, Luigi Marchionni, Yulia A. Medvedeva, Aki Minoda, Ferenc Müller, Manuel Munoz Aguirre, Mitsuyoshi Murata, Hiromi Nishiyori, Kazuhiro Nitta, Shuhei Noguchi, Yukihiko Noro, Ramil Nurtdinov, Yasushi Okazaki, Valerio Orlando, Denis Paquette, Callum Parr, Owen J.L. Rackham, Patrizia Rizzu, Diego Fernando Sanchez, Albin Sandelin, Pillay Sanjana, Colin A.M. Semple, Harshita Sharma, Youtaro Shibayama, Divya Sivaraman, Takahiro Suzuki, Susanne Szumowski, Michihira Tagami, Martin S Taylor, Chikashi Terao, Malte Thodberg, Supat Thongjuea, Vidisha Tripathi, Igor Ulitsky, Roberto Verardo, Ilya Vorontsov, Chinatsu Yamamoto, Robert S. Young, John Kenneth Baillie, Alistair R.R. Forrest, Roderic Guigó, Michael M. Hoffman, Chung-Chau Hon, Takeya Kasukawa, Sakari Kauppinen, Juha Kere, Boris Lenhard, Claudio Schneider, Harukazu Suzuki, Ken Yagi, Michiel de Hoon, Jay W Shin, Piero Carninci
Long non-coding RNAs (lncRNAs) constitute the majority of transcripts in mammalian genomes and yet, their functions remain largely unknown. We systematically suppressed 285 lncRNAs in human dermal fibroblasts and quantified cellular growth, morphological changes, and transcriptomic responses using Capped Analysis of Gene Expression (CAGE). The resulting transcriptomic profiles recapitulated the observed cellular phenotypes, yielding specific roles for over 40% of analyzed lncRNAs in regulating distinct biological pathways, transcriptional machinery, alternative promoter activity and architecture usage. Overall, combining cellular and molecular profiling provided a powerful approach to unravel the distinct functions of lncRNAs, which we highlight with specific functional roles for ZNF213-AS1 and lnc-KHDC3L-2.
1,137 downloads bioinformatics
Analysis of single-cell RNA-seq data begins with pre-processing of sequencing reads to generate count matrices. We investigate algorithm choices for the challenges of pre-processing, and describe a workflow that balances efficiency and accuracy. Our workflow is based on the kallisto (<https://pachterlab.github.io/kallisto/>) and bustools (<https://bustools.github.io/>) programs, and is near-optimal in speed and memory. The workflow is modular, and we demonstrate its flexibility by showing how it can be used for RNA velocity analyses. Documentation and tutorials for using the kallisto | bus workflow are available at <https://www.kallistobus.tools/>.
The bone marrow (BM) constitutes the primary site for life-long blood production and skeletal regeneration. However, its cellular composition and the spatial organization into distinct niches remains controversial. Here, we combine single-cell and spatially resolved transcriptomics to systematically map the molecular and cellular composition of the endosteal, sinusoidal, and arteriolar BM niches. This allowed us to transcriptionally profile all major BM resident cell types, determine their localization, and clarify the cellular and spatial sources of key growth factors and cytokines. Our data demonstrate that previously unrecognized Cxcl12-abundant reticular (CAR) cell subsets (i.e. Adipo- and Osteo- CAR cells) differentially localize to sinusoidal or arteriolar surfaces, locally act as professional cytokine secreting cells, and thereby establish distinct peri-vascular micro-niches. Importantly, we also demonstrate that the 3-dimensional organization of the BM can be accurately inferred from single-cell gene expression data using the newly developed RNA-Magnet algorithm. Together, our study reveals the cellular and spatial organization of BM niches, and offers a novel strategy to dissect the complex organization of whole organs in a systematic manner.
1,108 downloads genomics
Chromosomes are folded so that active and inactive chromatin domains are spatially segregated. Compartmentalization is thought to occur through polymer phase/microphase separation mediated by interactions between loci of similar type. The nature and dynamics of these interactions are not known. We developed liquid chromatin Hi-C to map the stability of associations between loci. Before fixation and Hi-C, chromosomes are fragmented removing the strong polymeric constraint to enable detection of intrinsic locus-locus interaction stabilities. Compartmentalization is stable when fragments are over 10-25 kb. Fragmenting chromatin into pieces smaller than 6 kb leads to gradual loss of genome organization. Dissolution kinetics of chromatin interactions vary for different chromatin domains. Lamin-associated domains are most stable, while interactions among speckle and polycomb-associated loci are more dynamic. Cohesin-mediated loops dissolve after fragmentation, possibly because cohesin rings slide off nearby DNA ends. Liquid chromatin Hi-C provides a genome-wide view of chromosome interaction dynamics.
1,086 downloads immunology
Comprehensive profiling of the human immune system in patients with cancer, autoimmune disease and during infections are providing valuable information that help us understand disease states and discriminate productive from inefficient immune responses and identify possible targets for immune modulation. Recent technical advances now allow for all immune cell populations and hundreds of plasma proteins to be detected using small volume blood samples. To democratize such systems-immunological analyses, further simplified blood sampling and preservation will be important. Here we describe that blood obtained via a nearly painless self-sampling device of 100 microliter of capillary blood that is preserved and frozen, can simplify systems-level immunomonitoring studies.
1,077 downloads evolutionary biology
Although estimated to have emerged in humans in Central Africa in the early 1900s, HIV-1, the main causative agent of AIDS, was only discovered in 1983. With very little direct biological data of HIV-1 from before the 1980s, far-reaching evolutionary and epidemiological inferences regarding the long pre-discovery phase of this pandemic are based on extrapolations by phylodynamic models of HIV-1 genomic sequences gathered mostly over recent decades. Here, using a very sensitive multiplex RT-PCR assay, we screened 1,652 formalin-fixed paraffin-embedded tissue specimens collected for pathology diagnostics in Kinshasa, Democratic Republic of Congo (DRC), between 1959 and 1967. We report the near-complete genome of one positive from 1966 ('DRC66') - a non-recombinant sister lineage to subtype C that constitutes the oldest HIV-1 near-full-length genome recovered to date. Root-to-tip plots showed the DRC66 sequence is not an outlier as would be expected if dating estimates from more recent genomes were systematically biased; and inclusion of DRC66 sequence in tip-dated BEAST analyses did not significantly alter root and internal node age estimates based on post-1978 HIV-1 sequences. There was larger variation in divergence time estimates among datasets that were subsamples of the available HIV-1 genomes from 1978-2015, showing the inherent phylogenetic stochasticity across subsets of the real HIV-1 diversity. In conclusion, this unique archival HIV-1 sequence provides direct genomic insight into HIV-1 in 1960s DRC, and, as an ancient-DNA calibrator, it validates our understanding of HIV-1 evolutionary history.
1,055 downloads molecular biology
Rapid perturbation of protein function permits the ability to define primary molecular responses while avoiding downstream cumulative effects of protein dysregulation. The auxin-inducible degron (AID) system was developed as a tool to achieve rapid and inducible protein degradation in non-plant systems. However, tagging proteins at their endogenous loci results in chronic, auxin-independent degradation by the proteasome. To correct this deficiency, we expressed the Auxin Response Transcription Factor (ARF) in an improved inducible degron system. ARF is absent from previously engineered AID systems, but ARF is a critical component of native auxin signaling. In plants, ARF directly interacts with AID in the absence of auxin and we found that expression of the ARF Phox and Bem1 (PB1) domain suppresses constitutive degradation of AID-tagged proteins. Moreover, the rate of auxin-induced AID degradation is substantially faster in the ARF-AID system. To test the ARF-AID system in a quantitative and sensitive manner, we measured genome-wide changes in nascent transcription after rapidly depleting the ZNF143 transcription factor. Transciptional profiling indicates that ZNF143 activates transcription in cis and ZNF143 regulates promoter-proximal paused RNA Polymerase density. Rapidly inducible degradation systems that preserve the target protein's native expression levels and patterns will revolutionize the study of biological systems by enabling specific and temporally defined protein dysregulation.
1,047 downloads systems biology
Maximilian Strunz, Lukas M Simon, Meshal Ansari, Laura F Mattner, Ilias Angelidis, Christoph H Mayr, Jaymin Kathiriya, Min Yee, Paulina Ogar, Arunima Sengupta, Igor Kukhtevich, Robert Schneider, Zhongming Zhao, Jens H.L. Neumann, Juergen Behr, Carola Voss, Tobias Stoeger, Mareike Lehmann, Melanie Koenigshoff, Gerald Burgstaller, Michael O'Reilly, Harold A. Chapman, Fabian J. Theis, Herbert B Schiller
Lung injury activates quiescent stem and progenitor cells to regenerate alveolar structures. The sequence and coordination of transcriptional programs during this process has largely remained elusive. Using single cell RNA-seq, we first generated a whole-organ birds-eye view on cellular dynamics and cell-cell communication networks during mouse lung regeneration from ~30,000 cells at six timepoints. We discovered an injury-specific progenitor cell state characterized by Krt8 in flat epithelial cells covering alveolar surfaces. The number of these cells peaked during fibrogenesis in independent mouse models, as well as in human acute lung injury and fibrosis. Krt8+ alveolar progenitors featured a highly distinct connectome of receptor-ligand pairs with endothelial cells, fibroblasts, and macrophages. To sky dive into epithelial differentiation dynamics, we sequenced >30,000 sorted epithelial cells at 18 timepoints and computationally derived cell state trajectories that were validated by lineage tracing genetic reporter mice. Airway stem cells within the club cell lineage and alveolar type-2 cells underwent transcriptional convergence onto the same Krt8+ progenitor cell state, which later resolved by terminal differentiation into alveolar type-1 cells. We derived distinct transcriptional regulators as key switch points in this process and show that induction of NFkB, p53, and hypoxia driven gene expression programs precede a Sox4, Ctnnb1, and Wwtr1 driven commitment towards alveolar type-1 cell fate. We show that epithelial cell plasticity can induce non-gradual transdifferentiation, involving intermediate progenitor cell states that may persist and promote disease if checkpoint signals for terminal differentiation are perturbed.
1,025 downloads neuroscience
The dorsal striatum is organized into domains that drive characteristic behaviors, and receive inputs from different parts of the cortex which modulate similar behaviors. Striatal responses to cortical inputs, however, can be affected by changes in connection strength, local striatal circuitry, and thalamic inputs. Therefore, it is unclear whether the pattern of activity across striatal domains mirrors that across the cortex or differs from it. Here we use simultaneous large-scale recordings in the cortex and the striatum to show that striatal activity can be accurately predicted by spatiotemporal activity patterns in the cortex. The relationship between activity in the cortex and the striatum was spatially consistent with corticostriatal anatomy, and temporally consistent with a feedforward drive. Each striatal domain exhibited specific sensorimotor responses that predictably followed activity in the associated cortical regions, and the corticostriatal relationship remained unvaried during passive states or performance of a task probing visually guided behavior. However, the task's visual stimuli and corresponding behavioral responses evoked relatively more activity in the striatum than in associated cortical regions. This increased striatal activity involved an additive offset in firing rate, which was independent of task engagement but only present in animals that had learned the task. Thus, striatal activity largely reflects patterns of cortical activity, deviating from them in a simple additive fashion for learned stimuli or actions.
1,002 downloads neuroscience
Machine learning-based analysis of human functional magnetic resonance imaging (fMRI) patterns has enabled the visualization of perceptual content. However, it has been limited to the reconstruction with low-level image bases or to the matching to exemplars. Recent work showed that visual cortical activity can be decoded (translated) into hierarchical features of a deep neural network (DNN) for the same input image, providing a way to make use of the information from hierarchical visual features. Here, we present a novel image reconstruction method, in which the pixel values of an image are optimized to make its DNN features similar to those decoded from human brain activity at multiple layers. We found that the generated images resembled the stimulus images (both natural images and artificial shapes) and the subjective visual content during imagery. While our model was solely trained with natural images, our method successfully generalized the reconstruction to artificial shapes, indicating that our model indeed reconstructs or generates images from brain activity, not simply matches to exemplars. A natural image prior introduced by another deep neural network effectively rendered semantically meaningful details to reconstructions by constraining reconstructed images to be similar to natural images. Furthermore, human judgment of reconstructions suggests the effectiveness of combining multiple DNN layers to enhance visual quality of generated images. The results suggest that hierarchical visual information in the brain can be effectively combined to reconstruct perceptual and subjective images.
1,001 downloads microbiology
Many eukaryotic microbes have complex lifecycles that include both sexual and asexual phases with strict species-specificity. While the asexual cycle of the protistan parasite Toxoplasma gondii can occur in any warm-blooded mammal, the sexual cycle is restricted to the feline intestine1. The molecular determinants that identify cats as the definitive host for T. gondii are unknown. Here, we defined the mechanism of species specificity for T. gondii sexual development and break the species barrier to allow the sexual cycle to occur in mice. We determined that T. gondii sexual development occurs when cultured feline intestinal epithelial cells are supplemented with linoleic acid. Felines are the only mammals that lack delta-6-desaturase activity in their intestines, which is required for linoleic acid metabolism, resulting in systemic excess of linoleic acid2, 3. We found that inhibition of murine delta-6-desaturase and supplementation of their diet with linoleic acid allowed T. gondii sexual development in mice. This mechanism of species specificity is the first defined for a parasite sexual cycle. This work highlights how host diet and metabolism shape coevolution with microbes. The key to unlocking the species boundaries for other eukaryotic microbes may also rely on the lipid composition of their environments as we see increasing evidence for the importance of host lipid metabolism during parasitic lifecycles4, 5. Pregnant women are advised against handling cat litter as maternal infection with T. gondii can be transmitted to the fetus with potentially lethal outcomes. Knowing the molecular components that create a conducive environment for T. gondii sexual reproduction will allow for development of therapeutics that prevent shedding of T. gondii parasites. Finally, given the current reliance on companion animals to study T. gondii sexual development, this work will allow the T. gondii field to use of alternative models in future studies.
995 downloads bioinformatics
Accurate prediction of protein structure is one of the central challenges of biochemistry. Despite significant progress made by co-evolution methods to predict protein structure from signatures of residue-residue coupling found in the evolutionary record, a direct and explicit mapping between protein sequence and structure remains elusive, with no substantial recent progress. Meanwhile, rapid developments in deep learning, which have found remarkable success in computer vision, natural language processing, and quantum chemistry raise the question of whether a deep learning based approach to protein structure could yield similar advancements. A key ingredient of the success of deep learning is the reformulation of complex, human-designed, multi-stage pipelines with differentiable models that can be jointly optimized end-to-end. We report the development of such a model, which reformulates the entire structure prediction pipeline using differentiable primitives. Achieving this required combining four technical ideas: (1) the adoption of a recurrent neural architecture to encode the internal representation of protein sequence, (2) the parameterization of (local) protein structure by torsional angles, which provides a way to reason over protein conformations without violating the covalent chemistry of protein chains, (3) the coupling of local protein structure to its global representation via recurrent geometric units, and (4) the use of a differentiable loss function to capture deviations between predicted and experimental structures. To our knowledge this is the first end-to-end differentiable model for learning of protein structure. We test the effectiveness of this approach using two challenging tasks: the prediction of novel protein folds without the use of co-evolutionary information, and the prediction of known protein folds without the use of structural templates. On the first task the model achieves state-of-the-art performance, even when compared to methods that rely on co-evolutionary data. On the second task the model is competitive with methods that use experimental protein structures as templates, achieving 3-7Å accuracy despite being template-free. Beyond protein structure prediction, end-to-end differentiable models of proteins represent a new paradigm for learning and modeling protein structure, with potential applications in docking, molecular dynamics, and protein design.
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!