Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 63,068 bioRxiv papers from 279,747 authors.
Most downloaded bioRxiv papers, since beginning of last month
61,421 results found. For more information, click each entry to expand.
1,875 downloads genomics
The allocation of a sequencing budget when designing single cell RNA-seq experiments requires consideration of the tradeoff between number of cells sequenced and the read depth per cell. One approach to the problem is to perform a power analysis for a univariate objective such as differential expression. However, many of the goals of single-cell analysis requires consideration of the multivariate structure of gene expression, such as clustering. We introduce an approach to quantifying the impact of sequencing depth and cell number on the estimation of a multivariate generative model for gene expression that is based on error analysis in the framework of a variational autoencoder. We find that at shallow depths, the marginal benefit of deeper sequencing per cell significantly outweighs the benefit of increased cell numbers. Above about 15,000 reads per cell the benefit of increased sequencing depth is minor. Code for the workflow reproducing the results of the paper is available at https://github.com/pachterlab/SBP_2019/.
1,835 downloads molecular biology
Yuancheng Lu, Anitha Krishnan, Benedikt Brommer, Xiao Tian, Margarita Meer, Daniel L. Vera, Chen Wang, Qiurui Zeng, Doudou Yu, Michael S. Bonkowski, Jae-Hyun Yang, Emma M. Hoffmann, Songlin Zhou, Ekaterina Korobkina, Noah Davidsohn, Michael B. Schultz, Karolina Chwalek, Luis A. Rajman, George M Church, Konrad Hochedlinger, Vadim N Gladyshev, Steve Horvath, Meredith S. Gregory-Ksander, Bruce R. Ksander, Zhigang He, David A. Sinclair
Ageing is a degenerative process leading to tissue dysfunction and death. A proposed cause of ageing is the accumulation of epigenetic noise, which disrupts youthful gene expression patterns that are required for cells to function optimally and recover from damage. Changes to DNA methylation patterns over time form the basis of an 'ageing clock', but whether old individuals retain information to reset the clock and, if so, whether this would improve tissue function is not known. Of all the tissues in the body, the central nervous system (CNS) is one of the first to lose regenerative capacity. Using the eye as a model tissue, we show that expression of Oct4, Sox2, and Klf4 genes (OSK) in mice resets youthful gene expression patterns and the DNA methylation age of retinal ganglion cells, promotes axon regeneration after optic nerve crush injury, and restores vision in a mouse model of glaucoma and in normal old mice. This process, which we call recovery of information via epigenetic reprogramming or REVIVER, requires the DNA demethylases Tet1 and Tet2, indicating that DNA methylation patterns don't just indicate age, they participate in ageing. Thus, old tissues retain a faithful record of youthful epigenetic information that can be accessed for functional age reversal.
1,701 downloads genomics
Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from 'regularized negative binomial regression', where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. Importantly, we show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates. Our procedure omits the need for heuristic steps including pseudocount addition or log-transformation, and improves common downstream analytical tasks such as variable gene selection, dimensional reduction, and differential expression. Our approach can be applied to any UMI-based scRNA-seq dataset and is freely available as part of the R package sctransform (https://github.com/ChristophH/sctransform), with a direct interface to our single-cell toolkit Seurat.
1,689 downloads bioinformatics
Histopathological images are essential for the diagnosis of cancer type and selection of optimal treatment. However, the current clinical process of manual inspection of images is time consuming and prone to intra- and inter-observer variability. Here we show that key aspects of cancer image analysis can be performed by deep convolutional neural networks (CNNs) across a wide spectrum of cancer types. In particular, we implement CNN architectures based on Google Inception v3 transfer learning to analyze 27815 H&E slides from 23 cohorts in The Cancer Genome Atlas in studies of tumor/normal status, cancer subtype, and mutation status. For 19 solid cancer types we are able to classify tumor/normal status of whole slide images with extremely high AUCs (0.995±0.008). We are also able to classify cancer subtypes within 10 tissue types with AUC values well above random expectations (micro-average 0.87±0.1). We then perform a cross-classification analysis of tumor/normal status across tumor types. We find that classifiers trained on one type are often effective in distinguishing tumor from normal in other cancer types, with the relationships among classifiers matching known cancer tissue relationships. For the more challenging problem of mutational status, we are able to classify TP53 mutations in three cancer types with AUCs from 0.65-0.80 using a fully-trained CNN, and with similar cross-classification accuracy across tissues. These studies demonstrate the power of CNNs for not only classifying histopathological images in diverse cancer types, but also for revealing shared biology between tumors. We have made software available at: https://github.com/javadnoorb/HistCNN
1,668 downloads neuroscience
The hippocampal-entorhinal system is important for spatial and relational memory tasks. We formally link these domains; provide a mechanistic understanding of the hippocampal role in generalisation; and offer unifying principles underlying many entorhinal and hippocampal cell-types. We propose medial entorhinal cells form a basis describing structural knowledge, and hippocampal cells link this basis with sensory representations. Adopting these principles, we introduce the Tolman-Eichenbaum machine (TEM). After learning, TEM entorhinal cells include grid, band, border and object-vector cells. Hippocampal cells include place and landmark cells, remapping between environments. Crucially, TEM also predicts empirically recorded representations in complex non-spatial tasks. TEM predicts hippocampal remapping is not random as previously believed. Rather structural knowledge is preserved across environments. We confirm this in simultaneously recorded place and grid cells. One Sentence Summary Simple principles of representation and generalisation unify spatial and non-spatial accounts of hippocampus and explain many cell representations.
1,565 downloads genomics
Kyle J Travaglini, Ahmad N Nabhan, Lolita Penland, Rahul Sinha, Astrid Gillich, Rene V Sit, Stephen Chang, Stephanie D Conley, Yasuo Mori, Jun Seita, Gerald J. Berry, Joseph B Shrager, Ross J Metzger, Christin S Kuo, Norma Neff, Irving L Weissman, Stephen R. Quake, Mark A Krasnow
Although single cell RNA sequencing studies have begun providing compendia of cell expression profiles, it has proven more difficult to systematically identify and localize all molecular cell types in individual organs to create a full molecular cell atlas. Here we describe droplet- and plate-based single cell RNA sequencing applied to ~70,000 human lung and blood cells, combined with a multi-pronged cell annotation approach, which have allowed us to define the gene expression profiles and anatomical locations of 58 cell populations in the human lung, including 41 of 45 previously known cell types or subtypes and 14 new ones. This comprehensive molecular atlas elucidates the biochemical functions of lung cell types and the cell-selective transcription factors and optimal markers for making and monitoring them; defines the cell targets of circulating hormones and predicts local signaling interactions including sources and targets of chemokines in immune cell trafficking and expression changes on lung homing; and identifies the cell types directly affected by lung disease genes. Comparison to mouse identified 17 molecular types that appear to have been gained or lost during lung evolution and others whose expression profiles have been substantially altered, revealing extensive plasticity of cell types and cell-type-specific gene expression during organ evolution including expression switches between cell types. This lung atlas provides the molecular foundation for investigating how lung cell identities, functions, and interactions are achieved in development and tissue engineering and altered in disease and evolution.
1,520 downloads bioengineering
Access to quantitative, robust, yet affordable diagnostic tools is necessary to reduce global infectious disease burden. Manual microscopy has served as a bedrock for diagnostics with wide adaptability, although at a cost of tedious labor and human errors. Automated robotic microscopes are poised to enable a new era of smart field microscopy but current platforms remain cost prohibitive and largely inflexible, especially for resource poor and field settings. Here we present Octopi, a low-cost ($250-$500) and reconfigurable autonomous microscopy platform capable of automated slide scanning and correlated bright-field and fluorescence imaging. Being highly modular, it also provides a framework for new disease-specific modules to be developed. We demonstrate the power of the platform by applying it to automated detection of malaria parasites in blood smears. Specifically, we discovered a spectral shift on the order of 10 nm for DAPI-stained Plasmodium falciparum malaria parasites. This shift allowed us to detect the parasites with a low magnification (equivalent to 10x) large field of view (2.56 mm^2) module. Combined with automated slide scanning, real time computer vision and machine learning-based classification, Octopi is able to screen more than 1.5 million red blood cells per minute for parasitemia quantification, with estimated diagnostic sensitivity and specificity exceeding 90% at parasitemia of 50/ul and 100% for parasitemia higher than 150/μl. With different modules, we further showed imaging of tissue slice and sputum sample on the platform. With roughly two orders of magnitude in cost reduction, Octopi opens up the possibility of a large robotic microscope network for improved disease diagnosis while providing an avenue for collective efforts for development of modular instruments.
1,511 downloads genomics
Michal Slyper, Caroline B. M. Porter, Orr Ashenberg, Julia Waldman, Eugene Drokhlyansky, Isaac Wakiro, Christopher Smillie, Gabriela Smith-Rosario, Jingyi Wu, Danielle Dionne, Sébastien Vigneau, Judit Jané-Valbuena, Sara Napolitano, Mei-Ju Su, Anand G. Patel, Asa Karlstrom, Simon Gritsch, Masashi Nomura, Avinash Waghray, Satyen H. Gohil, Alexander M. Tsankov, Livnat Jerby-Arnon, Ofir Cohen, Johanna Klughammer, Yanay Rosen, Joshua Gould, Bo Li, Lan Nguyen, Catherine J Wu, Benjamin Izar, Rizwan Haq, F. Stephen Hodi, Charles H. Yoon, Aaron N. Hata, Suzanne J. Baker, Mario L. Suvà, Raphael Bueno, Elizabeth H. Stover, Ursula A. Matulonis, Michael R. Clay, Micheal A. Dyer, Natalie B. Collins, Nikhil Wagle, Asaf Rotem, Bruce E. Johnson, Orit Rozenblatt-Rosen, Aviv Regev
Single cell genomics is essential to chart the complex tumor ecosystem. While single cell RNA-Seq (scRNA-Seq) profiles RNA from cells dissociated from fresh tumor tissues, single nucleus RNA-Seq (snRNA-Seq) is needed to profile frozen or hard-to-dissociate tumors. Each strategy requires modifications to fit the unique characteristics of different tissue and tumor types, posing a barrier to adoption. Here, we developed a systematic toolbox for profiling fresh and frozen clinical tumor samples using scRNA-Seq and snRNA-Seq, respectively. We tested eight tumor types of varying tissue and sample characteristics (resection, biopsy, ascites, and orthotopic patient-derived xenograft): lung cancer, metastatic breast cancer, ovarian cancer, melanoma, neuroblastoma, pediatric sarcoma, glioblastoma, pediatric high-grade glioma, and chronic lymphocytic leukemia. Analyzing 212,498 cells and nuclei from 39 clinical samples, we evaluated protocols by cell quality, recovery rate, and cellular composition. We optimized protocols for fresh tissue dissociation for different tumor types using a decision tree to account for the technical and biological variation between clinical samples. We established methods for nucleus isolation from OCT embedded and fresh-frozen tissues, with an optimization matrix varying mechanical force, buffer, and detergent. scRNA-Seq and snRNA-Seq from matched samples recovered the same cell types and intrinsic expression profiles, but at different proportions. Our work provides direct guidance across a broad range of tumors, including criteria for testing and selecting methods from the toolbox for other tumors, thus paving the way for charting tumor atlases.
1,384 downloads genomics
Konrad Karczewski, Laurent C Francioli, Grace Tiao, Beryl B Cummings, Jessica Alföldi, Qingbo Wang, Ryan L Collins, Kristen M Laricchia, Andrea Ganna, Daniel P. Birnbaum, Laura D Gauthier, Harrison Brand, Matthew Solomonson, Nicholas A Watts, Daniel Rhodes, Moriel Singer-Berk, Eleina M England, Eleanor G Seaby, Jack A. Kosmicki, Raymond K Walters, Katherine Tashman, Yossi Farjoun, Eric Banks, Timothy Poterba, Arcturus Wang, Cotton Seed, Nicola Whiffin, Jessica X Chong, Kaitlin E. Samocha, Emma Pierce-Hoffman, Zachary Zappala, Anne H. O’Donnell-Luria, Eric Vallabh Minikel, Ben Weisburd, Monkol Lek, James S Ware, Christopher Vittal, Irina M Armean, Louis Bergelson, Kristian Cibulskis, Kristen M Connolly, Miguel Covarrubias, Stacey Donnelly, Steven Ferriera, Stacey Gabriel, Jeff Gentry, Namrata Gupta, Thibault Jeandet, Diane Kaplan, Christopher Llanwarne, Ruchi Munshi, Sam Novod, Nikelle Petrillo, David Roazen, Valentin Ruano-Rubio, Andrea Saltzman, Molly Schleicher, Jose Soto, Kathleen Tibbetts, Charlotte Tolonen, Gordon Wade, Michael E. Talkowski, The Genome Aggregation Database Consortium, Benjamin M Neale, Mark J. Daly, Daniel G. MacArthur
Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes critical for an organism’s function will be depleted for such variants in natural populations, while non-essential genes will tolerate their accumulation. However, predicted loss-of-function (pLoF) variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes. Here, we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence pLoF variants in this cohort after filtering for sequencing and annotation artifacts. Using an improved human mutation rate model, we classify human protein-coding genes along a spectrum representing tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.
1,312 downloads genomics
Eugene Drokhlyansky, Christopher S. Smillie, Nicholas Van Wittenberghe, Maria Ericsson, Gabriel K. Griffin, Danielle Dionne, Michael S Cuoco, Max N. Goder-Reiser, Tatyana Sharova, Andrew J. Aguirre, Genevieve M. Boland, Daniel Graham, Orit Rozenblatt-Rosen, Ramnik J. Xavier, Aviv Regev
As the largest branch of the autonomic nervous system, the enteric nervous system (ENS) controls the entire gastrointestinal tract, but remains incompletely characterized. Here, we develop RAISIN RNA-seq, which enables the capture of intact single nuclei along with ribosome-bound mRNA, and use it to profile the adult mouse and human colon to generate a reference map of the ENS at a single-cell resolution. This map reveals an extraordinary diversity of neuron subsets across intestinal locations, ages, and circadian phases, with conserved transcriptional programs that are shared between human and mouse. These data suggest possible revisions to the current model of peristalsis and molecular mechanisms that may allow enteric neurons to orchestrate tissue homeostasis, including immune regulation and stem cell maintenance. Human enteric neurons specifically express risk genes for neuropathic, inflammatory, and extra-intestinal diseases with concomitant gut dysmotility. Our study therefore provides a roadmap to understanding the ENS in health and disease.
1,290 downloads neuroscience
Machine learning-based analysis of human functional magnetic resonance imaging (fMRI) patterns has enabled the visualization of perceptual content. However, it has been limited to the reconstruction with low-level image bases or to the matching to exemplars. Recent work showed that visual cortical activity can be decoded (translated) into hierarchical features of a deep neural network (DNN) for the same input image, providing a way to make use of the information from hierarchical visual features. Here, we present a novel image reconstruction method, in which the pixel values of an image are optimized to make its DNN features similar to those decoded from human brain activity at multiple layers. We found that the generated images resembled the stimulus images (both natural images and artificial shapes) and the subjective visual content during imagery. While our model was solely trained with natural images, our method successfully generalized the reconstruction to artificial shapes, indicating that our model indeed reconstructs or generates images from brain activity, not simply matches to exemplars. A natural image prior introduced by another deep neural network effectively rendered semantically meaningful details to reconstructions by constraining reconstructed images to be similar to natural images. Furthermore, human judgment of reconstructions suggests the effectiveness of combining multiple DNN layers to enhance visual quality of generated images. The results suggest that hierarchical visual information in the brain can be effectively combined to reconstruct perceptual and subjective images.
1,269 downloads bioinformatics
Kishwar Shafin, Trevor Pesout, Ryan Lorig-Roach, Marina Haukness, Hugh E Olsen, Colleen Bosworth, Joel Armstrong, Kristof Tigyi, Nicholas Maurer, Sergey Koren, Fritz J. Sedlazeck, Tobias Marschall, Simon Mayes, Vania Costa, Justin M Zook, Kelvin J Liu, Duncan Kilburn, Melanie Sorensen, Katy M Munson, Mitchell R. Vollger, Evan E Eichler, Sofie Salama, David Haussler, Richard E. Green, Mark Akeson, Adam Phillippy, Karen H Miga, Paolo Carnevali, Miten Jain, Benedict Paten
Present workflows for producing human genome assemblies from long-read technologies have cost and production time bottlenecks that prohibit efficient scaling to large cohorts. We demonstrate an optimized PromethION nanopore sequencing method for eleven human genomes. The sequencing, performed on one machine in nine days, achieved an average 63x coverage, 42 Kb read N50, 90% median read identity and 6.5x coverage in 100 Kb+ reads using just three flow cells per sample. To assemble these data we introduce new computational tools: Shasta - a de novo long read assembler, and MarginPolish & HELEN - a suite of nanopore assembly polishing algorithms. On a single commercial compute node Shasta can produce a complete human genome assembly in under six hours, and MarginPolish & HELEN can polish the result in just over a day, achieving 99.9% identity (QV30) for haploid samples from nanopore reads alone. We evaluate assembly performance for diploid, haploid and trio-binned human samples in terms of accuracy, cost, and time and demonstrate improvements relative to current state-of-the-art methods in all areas. We further show that addition of proximity ligation (Hi-C) sequencing yields near chromosome-level scaffolds for all eleven genomes.
1,247 downloads genomics
We developed Hackflex, a low-cost method for the production of Illumina-compatible sequencing libraries that allows up to 11 times more libraries for high-throughput Illumina sequencing to be generated at a fixed cost. We call this new method Hackflex. Quality of library preparation was tested by constructing libraries from E. coli MG1655 genomic DNA using either Hackflex, standard Nextera Flex or a variation of standard Nextera Flex in which the bead-linked transposase is diluted prior to use. We demonstrated that Hackflex can produce high quality libraries and yields a highly uniform coverage, equivalent to the standard Nextera Flex kit. Using Hackflex, we were able to achieve a per sample reagent cost of library prep of A$8.66, which is 8.23 times lower than the Standard Nextera Flex protocol at advertised retail price. An additional simple modification to the protocol enables a further price reduction of up to 11 fold or about A$6.50/sample. This method will allow researchers to construct more libraries within a given budget, thereby yielding more data and facilitating research programs where sequencing large numbers of libraries is beneficial.
1,190 downloads genomics
Xin Jin, Sean K Simmons, Amy X Guo, Ashwin S Shetty, Michelle Ko, Lan Nguyen, Elise B Robinson, Paul Oyler, Nathan Curry, Giulio Deangeli, Simona Lodato, Joshua Z Levin, Aviv Regev, Feng Zhang, Paola Arlotta
The thousands of disease risk genes and loci identified through human genetic studies far outstrip our current capacity to systematically study their functions. New experimental approaches are needed for functional investigations of large panels of genes in a biologically relevant context. Here, we developed a scalable genetic screen approach, in vivo Perturb-Seq, and applied this method to the functional evaluation of 35 autism spectrum disorder (ASD) de novo loss-of-function risk genes. Using CRISPR-Cas9, we introduced frameshift mutations in these risk genes in pools, within the developing brain in utero, and then performed single-cell RNA-Seq in the postnatal brain. We identified cell type-specific gene signatures from both neuronal and glial cell classes that are affected by genetic perturbations and pointed at elements of both convergent and divergent cellular effects across this cohort of ASD risk genes. In vivo Perturb-Seq pioneers a systems genetics approach to investigate at scale how diverse mutations affect cell types and states in the biologically relevant context of the developing organism.
1,190 downloads scientific communication and education
Every year for three years (2016 to 2018), I tried to identify every single person hired as a tenure track prof in ecology or an allied field (e.g., fish & wildlife) in N. America. I identified a total of 566 hires. I used public sources to compile various data on the new hires and the institutions that hired them (e.g., number of publications, teaching experience, hiring institution Carnegie class). I also compiled data provided by anonymous ecology faculty job seekers on ecoevojobs.net (e.g., number of positions applied for, number of publications, numbers of interviews and offers). And I polled readers of the Dynamic Ecology blog to get information about applicant and search committee behavior (e.g., regarding customization of applications to the hiring institution). These data address some widespread anxieties and misunderstandings about the ecology faculty job market, and also speak to gender diversity and equity in recent ecology faculty hiring. They complement, and in some cases improve on, other sources of information, such as anecdotal personal experiences.
1,176 downloads genetics
It has been known for 115 years that, in humans, diverse cognitive traits are positively intercorrelated; this forms the basis for the general factor of intelligence ( g ). We directly test for a genetic basis for g using data from seven different cognitive tests (N = 11,263 to N = 331,679) and genome-wide autosomal single nucleotide polymorphisms. A genetic g factor accounts for 58.4% (SE = 4.8%) of the genetic variance in the cognitive traits, with trait-specific genetic factors accounting for the remaining 41.6%. We distill genetic loci broadly relevant for many cognitive traits ( g ) from loci associated with only individual cognitive traits. These results elucidate the etiological basis for a long-known yet poorly-understood phenomenon, revealing a fundamental dimension of genetic sharing across diverse cognitive traits.
1,155 downloads genomics
Objective Type 2 diabetes (T2D) is a complex disease characterized by pancreatic islet dysfunction, insulin resistance, and disruption of blood glucose levels. Genome wide association studies (GWAS) have identified >400 independent signals that encode genetic predisposition. More than 90% of the associated single nucleotide polymorphisms (SNPs) localize to non-coding regions and are enriched in chromatin-defined islet enhancer elements, indicating a strong transcriptional regulatory component to disease susceptibility. Pancreatic islets are a mixture of cell types that express distinct hormonal programs, and so each cell type may contribute differentially to the underlying regulatory processes that modulate T2D-associated transcriptional circuits. Existing chromatin profiling methods such as ATAC-seq and DNase-seq, applied to islets in bulk, produce aggregate profiles that mask important cellular and regulatory heterogeneity. Methods We present genome-wide single cell chromatin accessibility profiles in >1,600 cells derived from a human pancreatic islet sample using single-cell-combinatorial-indexing ATAC-seq (sci-ATAC-seq). We also developed a deep learning model based on the U-Net architecture to accurately predict open chromatin peak calls in rare cell populations. Results We show that sci-ATAC-seq profiles allow us to deconvolve alpha, beta, and delta cell populations and identify cell-type-specific regulatory signatures underlying T2D. Particularly, we find that T2D GWAS SNPs are significantly enriched in beta cell-specific and cross cell-type shared islet open chromatin, but not in alpha or delta cell-specific open chromatin. We also demonstrate, using less abundant delta cells, that deep-learning models can improve signal recovery and feature reconstruction of rarer cell populations. Finally, we use co-accessibility measures to nominate the cell-specific target genes at 104 non-coding T2D GWAS signals. Conclusions Collectively, we identify the islet cell type of action across genetic signals of T2D predisposition and provide higher-resolution mechanistic insights into genetically encoded risk pathways.
1,137 downloads zoology
Centuries of zoological studies amassed billions of specimens in collections worldwide. Genomics of these specimens promises to rejuvenate biodiversity research. The obstacles stem from DNA degradation with specimen age. Overcoming this challenge, we set out to resolve a series of long-standing controversies involving a group of butterflies. We deduced geographical origins of several ancient specimens of uncertain provenance that are at the heart of these debates. Here, genomics tackles one of the greatest problems in zoology: countless old, poorly documented specimens that serve as irreplaceable embodiments of species concepts. The ability to figure out where they were collected will resolve many on-going disputes. More broadly, we show the utility of genomics applied to ancient museum specimens to delineate the boundaries of species and populations, and to hypothesize about genotypic determinants of phenotypic traits.
1,121 downloads immunology
Antibody recognition of antigen relies on the specific interaction of amino acids at the paratope-epitope interface. A long-standing question in the fields of immunology and structural biology is whether paratope-epitope interaction is predictable. A fundamental premise for the predictability of paratope-epitope binding is the existence of structural units that are universally shared among antibody-antigen binding complexes. Here, we identified structural interaction motifs, which together compose a vocabulary of paratope-epitope binding that is shared among investigated antibody-antigen complexes. The vocabulary (i) is finite with less than 104 motifs, (ii) mediates specific and non-redundant interactions between paratope-epitope pairs, (iii) is immunity-specific (distinct from the motif vocabulary used by non-immune protein-protein interactions), and (iv) enables the machine learning prediction of paratope or epitope. The discovery of a vocabulary of paratope-epitope interaction demonstrates the learnability and predictability of paratope-epitope interaction.
1,112 downloads genomics
Jiarui Ding, Xian Adiconis, Sean K Simmons, Monika S. Kowalczyk, Cynthia C. Hession, Nemanja D. Marjanovic, Travis K Hughes, Marc H Wadsworth, Tyler Burks, Lan T. Nguyen, John Y. H. Kwon, Boaz Barak, William Ge, Amanda J. Kedaigle, Shaina Carroll, Shuqiang Li, Nir Hacohen, Orit Rozenblatt-Rosen, Alex K Shalek, Alexandra-Chloé Villani, Aviv Regev, Joshua Z Levin
A multitude of single-cell RNA sequencing methods have been developed in recent years, with dramatic advances in scale and power, and enabling major discoveries and large scale cell mapping efforts. However, these methods have not been systematically and comprehensively benchmarked. Here, we directly compare seven methods for single cell and/or single nucleus profiling from three types of samples -- cell lines, peripheral blood mononuclear cells and brain tissue -- generating 36 libraries in six separate experiments in a single center. To analyze these datasets, we developed and applied scumi, a flexible computational pipeline that can be used for any scRNA-seq method. We evaluated the methods for both basic performance and for their ability to recover known biological information in the samples. Our study will help guide experiments with the methods in this study as well as serve as a benchmark for future studies and for computational algorithm development.
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!