Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 50,150 bioRxiv papers from 233,740 authors.

Most downloaded bioRxiv papers, since beginning of last month

48,843 results found. For more information, click each entry to expand.

1: Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences
more details view paper

Posted to bioRxiv 29 Apr 2019

Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences
5,203 downloads synthetic biology

Alexander Rives, Siddharth Goyal, Joshua Meier, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, Rob Fergus

In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation. In biology, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Learning the natural distribution of evolutionary protein sequence variation is a logical step toward predictive and generative modeling for biology. To this end we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million sequences spanning evolutionary diversity. The resulting model maps raw sequences to representations of biological properties without labels or prior domain knowledge. The learned representation space organizes sequences at multiple levels of biological granularity from the biochemical to proteomic levels. Learning recovers information about protein structure: secondary structure and residue-residue contacts can be extracted by linear projections from learned representations. With small amounts of labeled data, the ability to identify tertiary contacts is further improved. Learning on full sequence diversity rather than individual protein families increases recoverable information about secondary structure. We show the networks generalize by adapting them to variant activity prediction from sequences only, with results that are comparable to a state-of-the-art variant predictor that uses evolutionary and structurally derived features.

2: Comprehensive integration of single cell data
more details view paper

Posted to bioRxiv 02 Nov 2018

Comprehensive integration of single cell data
4,829 downloads genomics

Tim Stuart, Andrew Butler, Paul Hoffman, Christoph Hafemeister, Efthymia Papalexi, William M. Mauck, Marlon Stoeckius, Peter Smibert, Rahul Satija

Single cell transcriptomics (scRNA-seq) has transformed our ability to discover and annotate cell types and states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, including high-dimensional immunophenotypes, chromatin accessibility, and spatial positioning, a key analytical challenge is to integrate these datasets into a harmonized atlas that can be used to better understand cellular identity and function. Here, we develop a computational strategy to "anchor" diverse datasets together, enabling us to integrate and compare single cell measurements not only across scRNA-seq technologies, but different modalities as well. After demonstrating substantial improvement over existing methods for data integration, we anchor scRNA-seq experiments with scATAC-seq datasets to explore chromatin differences in closely related interneuron subsets, and project single cell protein measurements onto a human bone marrow atlas to annotate and characterize lymphocyte populations. Lastly, we demonstrate how anchoring can harmonize in-situ gene expression and scRNA-seq datasets, allowing for the transcriptome-wide imputation of spatial gene expression patterns, and the identification of spatial relationships between mapped cell types in the visual cortex. Our work presents a strategy for comprehensive integration of single cell data, including the assembly of harmonized references, and the transfer of information across datasets. Availability: Installation instructions, documentation, and tutorials are available at: https://www.satijalab.org/seurat

3: Report of Partial findings from the National Toxicology Program Carcinogenesis Studies of Cell Phone Radiofrequency Radiation in Hsd: Sprague Dawley® SD rats (Whole Body Exposure)
more details view paper

Posted to bioRxiv 26 May 2016

Report of Partial findings from the National Toxicology Program Carcinogenesis Studies of Cell Phone Radiofrequency Radiation in Hsd: Sprague Dawley® SD rats (Whole Body Exposure)
4,342 downloads cancer biology

Michael Wyde, Mark Cesta, Chad Blystone, Susan Elmore, Paul Foster, Michelle Hooth, Grace Kissling, David Malarkey, Robert Sills, Matthew Stout, Nigel Walker, Kristine Witt, Mary Wolfe, John Bucher

The U.S. National Toxicology Program (NTP) has carried out extensive rodent toxicology and carcinogenesis studies of radiofrequency radiation (RFR) at frequencies and modulations used in the U.S. telecommunications industry. This report presents partial findings from these studies. The occurrences of two tumor types in male Harlan Sprague Dawley rats exposed to RFR, malignant gliomas in the brain and schwannomas of the heart, were considered of particular interest and are the subject of this report. The findings in this report were reviewed by expert peer reviewers selected by the NTP and National Institutes of Health (NIH). These reviews and responses to comments are included as appendices to this report, and revisions to the current document have incorporated and addressed these comments. When the studies are completed, they will undergo additional peer review before publication in full as part of the NTP's Toxicology and Carcinogenesis Technical Reports Series. No portion of this work has been submitted for publication in a scientific journal. Supplemental information in the form of four additional manuscripts has or will soon be submitted for publication. These manuscripts describe in detail the designs and performance of the RFR exposure system, the dosimetry of RFR exposures in rats and mice, the results to a series of pilot studies establishing the ability of the animals to thermoregulate during RFR exposures, and studies of DNA damage. (1) Capstick M, Kuster N, Kuhn S, Berdinas-Torres V, Wilson P, Ladbury J, Koepke G, McCormick D, Gauger J, and Melnick R. A radio frequency radiation reverberation chamber exposure system for rodents; (2) Yijian G, Capstick M, McCormick D, Gauger J, Horn T, Wilson P, Melnick RL, and Kuster N. Life time dosimetric assessment for mice and rats exposed to cell phone radiation; (3) Wyde ME, Horn TL, Capstick M, Ladbury J, Koepke G, Wilson P, Stout MD, Kuster N, Melnick R, Bucher JR, and McCormick D. Pilot studies of the National Toxicology Program's cell phone radiofrequency radiation reverberation chamber exposure system; (4) Smith-Roe SL, Wyde ME, Stout MD, Winters J, Hobbs CA, Shepard KG, Green A, Kissling GE, Tice RR, Bucher JR, and Witt KL. Evaluation of the genotoxicity of cell phone radiofrequency radiation in male and female rats and mice following subchronic exposure.

4: Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion
more details view paper

Posted to bioRxiv 18 Apr 2019

Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion
3,925 downloads genomics

Ansuman T. Satpathy, Jeffrey M. Granja, Kathryn E Yost, Yanyan Qi, Francesca Meschi, Geoffrey P McDermott, Brett N Olsen, Maxwell R. Mumbach, Sarah E Pierce, M. Ryan Corces, Preyas Shah, Jason C. Bell, Darisha Jhutty, Corey M Nemec, Jean Wang, Li Wang, Yifeng Yin, Paul G Giresi, Anne Lynn S. Chang, Grace X Y Zheng, William J. Greenleaf, Howard Y. Chang

Understanding complex tissues requires single-cell deconstruction of gene regulation with precision and scale. Here we present a massively parallel droplet-based platform for mapping transposase-accessible chromatin in tens of thousands of single cells per sample (scATAC-seq). We obtain and analyze chromatin profiles of over 200,000 single cells in two primary human systems. In blood, scATAC-seq allows marker-free identification of cell type-specific cis- and trans-regulatory elements, mapping of disease-associated enhancer activity, and reconstruction of trajectories of differentiation from progenitors to diverse and rare immune cell types. In basal cell carcinoma, scATAC-seq reveals regulatory landscapes of malignant, stromal, and immune cell types in the tumor microenvironment. Moreover, scATAC-seq of serial tumor biopsies before and after PD-1 blockade allows identification of chromatin regulators and differentiation trajectories of therapy-responsive intratumoral T cell subsets, revealing a shared regulatory program driving CD8+ T cell exhaustion and CD4+ T follicular helper cell development. We anticipate that droplet-based single-cell chromatin accessibility will provide a broadly applicable means of identifying regulatory factors and elements that underlie cell type and function.

5: Y-chromosome haplogroups from Hun, Avar and conquering Hungarian period nomadic people of the Carpathian Basin
more details view paper

Posted to bioRxiv 03 Apr 2019

Y-chromosome haplogroups from Hun, Avar and conquering Hungarian period nomadic people of the Carpathian Basin
3,712 downloads genetics

Endre Neparaczki, Zoltan Maroti, Tibor Kalmar, Kitti Maar, Istvan Nagy, Dora Latinovics, Agnes Kustar, Gyorgy Palfi, Erika Molnar, Antonia Marcsik, Csilla Balogh, Gabor Lorinczy, Szilard Gal, Peter Tomka, Bernadett Kovacsoczy, Laszlo Kovacs, Istvan Rasko, Tibor Torok

Hun, Avar and conquering Hungarian nomadic groups arrived into the Carpathian Basin from the Eurasian Steppes and significantly influenced its political and ethnical landscape. In order to shed light on the genetic affinity of above groups we have determined Y chromosomal haplogroups and autosomal loci, from 49 individuals, supposed to represent military leaders. Haplogroups from the Hun-age are consistent with Xiongnu ancestry of European Huns. Most of the Avar-age individuals carry east Eurasian Y haplogroups typical for modern north-eastern Siberian and Buryat populations and their autosomal loci indicate mostly unmixed Asian characteristics. In contrast the conquering Hungarians seem to be a recently assembled population incorporating pure European, Asian and admixed components. Their heterogeneous paternal and maternal lineages indicate similar phylogeographic origin of males and females, derived from Central-Inner Asian and European Pontic Steppe sources. Composition of conquering Hungarian paternal lineages is very similar to that of Baskhirs, supporting historical sources that report identity of the two groups.

6: End-to-end differentiable learning of protein structure
more details view paper

Posted to bioRxiv 14 Feb 2018

End-to-end differentiable learning of protein structure
3,383 downloads bioinformatics

Mohammed AlQuraishi

Accurate prediction of protein structure is one of the central challenges of biochemistry. Despite significant progress made by co-evolution methods to predict protein structure from signatures of residue-residue coupling found in the evolutionary record, a direct and explicit mapping between protein sequence and structure remains elusive, with no substantial recent progress. Meanwhile, rapid developments in deep learning, which have found remarkable success in computer vision, natural language processing, and quantum chemistry raise the question of whether a deep learning based approach to protein structure could yield similar advancements. A key ingredient of the success of deep learning is the reformulation of complex, human-designed, multi-stage pipelines with differentiable models that can be jointly optimized end-to-end. We report the development of such a model, which reformulates the entire structure prediction pipeline using differentiable primitives. Achieving this required combining four technical ideas: (1) the adoption of a recurrent neural architecture to encode the internal representation of protein sequence, (2) the parameterization of (local) protein structure by torsional angles, which provides a way to reason over protein conformations without violating the covalent chemistry of protein chains, (3) the coupling of local protein structure to its global representation via recurrent geometric units, and (4) the use of a differentiable loss function to capture deviations between predicted and experimental structures. To our knowledge this is the first end-to-end differentiable model for learning of protein structure. We test the effectiveness of this approach using two challenging tasks: the prediction of novel protein folds without the use of co-evolutionary information, and the prediction of known protein folds without the use of structural templates. On the first task the model achieves state-of-the-art performance, even when compared to methods that rely on co-evolutionary data. On the second task the model is competitive with methods that use experimental protein structures as templates, achieving 3-7Å accuracy despite being template-free. Beyond protein structure prediction, end-to-end differentiable models of proteins represent a new paradigm for learning and modeling protein structure, with potential applications in docking, molecular dynamics, and protein design.

7: The Genomic Formation of South and Central Asia
more details view paper

Posted to bioRxiv 31 Mar 2018

The Genomic Formation of South and Central Asia
3,312 downloads genomics

Vagheesh M Narasimhan, Nick J Patterson, Priya Moorjani, Iosif Lazaridis, Lipson Mark, Swapan Mallick, Nadin Rohland, Rebecca Bernardos, Alexander M Kim, Nathan Nakatsuka, Inigo Olalde, Alfredo Coppa, James Mallory, Vyacheslav Moiseyev, Janet Monge, Luca M Olivieri, Nicole Adamski, Nasreen Broomandkhoshbacht, Francesca Candilio, Olivia Cheronet, Brendan J Culleton, Matthew Ferry, Daniel Fernandes, Beatriz Gamarra, Daniel Gaudio, Mateja Hajdinjak, Eadaoin Harney, Thomas K Harper, Denise Keating, Ann-Marie Lawson, Megan Michel, Mario Novak, Jonas Oppenheimer, Niraj Rai, Kendra Sirak, Viviane Slon, Kristin Stewardson, Zhao Zhang, Gaziz Akhatov, Anatoly N Bagashev, Baurzhan Baitanayev, Gian Luca Bonora, Tatiana Chikisheva, Anatoly Derevianko, Enshin Dmitry, Katerina Douka, Nadezhda Dubova, Andrey Epimakhov, Suzanne Freilich, Dorian Fuller, Alexander Goryachev, Andrey Gromov, Bryan Hanks, Margaret Judd, Erlan Kazizov, Aleksander Khokhlov, Egor Kitov, Elena Kupriyanova, Pavel Kuznetsov, Donata Luiselli, Farhad Maksudov, Chris Meiklejohn, Deborah C Merrett, Roberto Micheli, Oleg Mochalov, Zahir Muhammed, Samridin Mustafakulov, Ayushi Nayak, Rykun M Petrovna, Davide Pettner, Richard Potts, Dmitry Razhev, Stefania Sarno, Kulyan Sikhymbaevae, Sergey M Slepchenko, Nadezhda Stepanova, Svetlana Svyatko, Sergey Vasilyev, Massimo Vidale, Dima Voyakin, Antonina Yermolayeva, Alisa Zubova, Vasant S Shinde, Carles Lalueza-Fox, Matthias Meyer, David Anthony, Nicole Boivin, Kumarasmy Thangaraj, Douglas Kennett, Michael Frachetti, Ron Pinhasi, David Reich

The genetic formation of Central and South Asian populations has been unclear because of an absence of ancient DNA. To address this gap, we generated genome-wide data from 362 ancient individuals, including the first from eastern Iran, Turan (Uzbekistan, Turkmenistan, and Tajikistan), Bronze Age Kazakhstan, and South Asia. Our data reveal a complex set of genetic sources that ultimately combined to form the ancestry of South Asians today. We document a southward spread of genetic ancestry from the Eurasian Steppe, correlating with the archaeologically known expansion of pastoralist sites from the Steppe to Turan in the Middle Bronze Age (2300-1500 BCE). These Steppe communities mixed genetically with peoples of the Bactria Margiana Archaeological Complex (BMAC) whom they encountered in Turan (primarily descendants of earlier agriculturalists of Iran), but there is no evidence that the main BMAC population contributed genetically to later South Asians. Instead, Steppe communities integrated farther south throughout the 2nd millennium BCE, and we show that they mixed with a more southern population that we document at multiple sites as outlier individuals exhibiting a distinctive mixture of ancestry related to Iranian agriculturalists and South Asian hunter-gathers. We call this group Indus Periphery because they were found at sites in cultural contact with the Indus Valley Civilization (IVC) and along its northern fringe, and also because they were genetically similar to post-IVC groups in the Swat Valley of Pakistan. By co-analyzing ancient DNA and genomic data from diverse present-day South Asians, we show that Indus Periphery-related people are the single most important source of ancestry in South Asia — consistent with the idea that the Indus Periphery individuals are providing us with the first direct look at the ancestry of peoples of the IVC — and we develop a model for the formation of present-day South Asians in terms of the temporally and geographically proximate sources of Indus Periphery-related, Steppe, and local South Asian hunter-gatherer-related ancestry. Our results show how ancestry from the Steppe genetically linked Europe and South Asia in the Bronze Age, and identifies the populations that almost certainly were responsible for spreading Indo-European languages across much of Eurasia.

8: Recovery of trait heritability from whole genome sequence data
more details view paper

Posted to bioRxiv 25 Mar 2019

Recovery of trait heritability from whole genome sequence data
3,127 downloads genetics

Pierrick Wainschtein, Deepti P Jain, Loic Yengo, Zhili Zheng, TOPMed Anthropometry Working Group, Trans-Omics for Precision Medicine Consortium, L Adrienne Cupples, Aladdin H Shadyab, Barbara McKnight, Benjamin M Shoemaker, Braxton D Mitchell, Bruce M Psaty, Charles Kooperberg, Dan Roden, Dawood Darbar, Donna K. Arnett, Elizabeth A Regan, Eric Boerwinkle, Jerome I Rotter, Matthew A Allison, Merry-Lynn N McDonald, Mina K. Chung, Nicholas L Smith, Patrick T Ellinor, Ramachandran S Vasan, Rasika A. Mathias, Stephen S Rich, Susan R Heckbert, Susan Redline, Xiuqing Guo, Y-D Ida Chen, Ching-Ti Liu, Mariza de Andrade, Lisa R. Yanek, Christine M Albert, Ryan D. Hernandez, Stephen T McGarvey, Kari E. North, Leslie A Lange, Bruce S. Weir, Cathy C. Laurie, Jian Yang, Peter M. Visscher

Heritability, the proportion of phenotypic variance explained by genetic factors, can be estimated from pedigree data, but such estimates are uninformative with respect to the underlying genetic architecture. Analyses of data from genome-wide association studies (GWAS) on unrelated individuals have shown that for human traits and disease, approximately one-third to two-thirds of heritability is captured by common SNPs. It is not known whether the remaining heritability is due to the imperfect tagging of causal variants by common SNPs, in particular if the causal variants are rare, or other reasons such as over-estimation of heritability from pedigree data. Here we show that pedigree heritability for height and body mass index (BMI) appears to be fully recovered from whole-genome sequence (WGS) data on 21,620 unrelated individuals of European ancestry. We assigned 47.1 million genetic variants to groups based upon their minor allele frequencies (MAF) and linkage disequilibrium (LD) with variants nearby, and estimated and partitioned variation accordingly. The estimated heritability was 0.79 (SE 0.09) for height and 0.40 (SE 0.09) for BMI, consistent with pedigree estimates. Low-MAF variants in low LD with neighbouring variants were enriched for heritability, to a greater extent for protein altering variants, consistent with negative selection thereon. Cumulatively variants in the MAF range of 0.0001 to 0.1 explained 0.54 (SE 0.05) and 0.51 (SE 0.11) of heritability for height and BMI, respectively. Our results imply that the still missing heritability of complex traits and disease is accounted for by rare variants, in particular those in regions of low LD.

9: MASST: A Web-based Basic Mass Spectrometry Search Tool for Molecules to Search Public Data.
more details view paper

Posted to bioRxiv 28 Mar 2019

MASST: A Web-based Basic Mass Spectrometry Search Tool for Molecules to Search Public Data.
2,958 downloads bioinformatics

Mingxun Wang, Alan K. Jarmusch, Fernando Vargas, Alexander A. Aksenov, Julia Gauglitz, Kelly Weldon, Daniel Petras, Ricardo da Silva, Robby Quinn, Alexey Melnik, Justin J.J. van der Hooft, Andres Mauricio Caraballo Rodriguez, Louis Felix Nothias, Christine M. Aceves, Morgan Panitchpakdi, Elizabeth Brown, Francesca Di Ottavio, Nicole Sikora, Emmanuel O. Elijah, Lara Labarta-Bajo, Emily G. Gentry, Shabnam Shalapour, Kathleen E. Kyle, Sara P. Puckett, Jeramie D. Watrous, Carolina S. Carpenter, Amina Bouslimani, Madeleine Ernst, Austin D Swafford, Elina I Zuniga, Marcy J. Balunas, Jonathan L. Klassen, Rohit Loomba, Rob Knight, Nuno Bandeira, Pieter C Dorrestein

We introduce a web-enabled small-molecule mass spectrometry (MS) search engine. To date, no tool can query all the public small-molecule tandem MS data in metabolomics repositories, greatly limiting the utility of these resources in clinical, environmental and natural product applications. Therefore, we introduce a Mass Spectrometry Search Tool (MASST) (https://proteosafe-extensions.ucsd.edu/masst/), that enables the discovery of molecular relationships among accessible public metabolomics and natural product tandem mass spectrometry data (MS/MS).

10: A reference map of the human protein interactome
more details view paper

Posted to bioRxiv 10 Apr 2019

A reference map of the human protein interactome
2,896 downloads systems biology

Katja Luck, Dae-Kyum Kim, Luke Lambourne, Kerstin Spirohn, Bridget E Begg, Wenting Bian, Ruth Brignall, Tiziana Cafarelli, Francisco J Campos-Laborie, Benoit Charloteaux, Dongsic Choi, Atina G. Cote, Meaghan Daley, Steven Deimling, Alice Desbuleux, Amelie Dricot, Marinella Gebbia, Madeleine F Hardy, Nishka Kishore, Jennifer J Knapp, Istvan A Kovacs, Irma Lemmens, Miles W Mee, Joseph C. Mellor, Carl Pollis, Carles Pons, Aaron D Richardson, Sadie Schlabach, Bridget Teeking, Anupama Yadav, Mariana Babor, Dawit Balcha, Omar Basha, Christian Bowman-Colin, Suet-Feung Chin, Soon Gang Choi, Claudia Colabella, Georges Coppin, Cassandra D'Amata, David De Ridder, Steffi De Rouck, Miquel Duran-Frigola, Hanane Ennajdaoui, Florian Goebels, Liana Goehring, Anjali Gopal, Ghazal Haddad, Elodie Hatchi, Mohamed Helmy, Yves Jacob, Yoseph Kassa, Serena Landini, Roujia Li, Natascha van Lieshout, Andrew MacWilliams, Dylan Markey, Joseph N Paulson, Sudharshan Rangarajan, John Rasla, Ashyad Rayhan, Thomas Rolland, Adriana San-Miguel, Yun Shen, Dayag Sheykhkarimli, Gloria M. Sheynkman, Eyal Simonovsky, Murat Taşan, Alexander Tejeda, Jean-Claude Twizere, Yang Wang, Robert J. Weatheritt, Jochen Weile, Yu Xia, Xinping Yang, Esti Yeger-Lotem, Quan Zhong, Patrick Aloy, Gary D. Bader, Javier De Las Rivas, Suzanne Gaudet, Tong Hao, Janusz Rak, Jan Tavernier, Vincent Tropepe, David E. Hill, Marc Vidal, Frederick P Roth, Michael A. Calderwood

Global insights into cellular organization and function require comprehensive understanding of interactome networks. Similar to how a reference genome sequence revolutionized human genetics, a reference map of the human interactome network is critical to fully understand genotype-phenotype relationships. Here we present the first human "all-by-all" binary reference interactome map, or "HuRI". With ~53,000 high-quality protein-protein interactions (PPIs), HuRI is approximately four times larger than the information curated from small-scale studies available in the literature. Integrating HuRI with genome, transcriptome and proteome data enables the study of cellular function within essentially any physiological or pathological cellular context. We demonstrate the use of HuRI in identifying specific subcellular roles of PPIs and protein function modulation via splicing during brain development. Inferred tissue-specific networks reveal general principles for the formation of cellular context-specific functions and elucidate potential molecular mechanisms underlying tissue-specific phenotypes of Mendelian diseases. HuRI thus represents an unprecedented, systematic reference linking genomic variation to phenotypic outcomes.

11: CRISPR-Cas9 Gene Editing in Lizards Through Microinjection of Unfertilized Oocytes
more details view paper

Posted to bioRxiv 31 Mar 2019

CRISPR-Cas9 Gene Editing in Lizards Through Microinjection of Unfertilized Oocytes
2,871 downloads genetics

Ashley M. Rasys, Sungdae Park, Rebecca E. Ball, Aaron J. Alcala, James D. Lauderdale, Douglas B. Menke

CRISPR-cas mediated gene editing has enabled the direct manipulation of gene function in many species. However, the reproductive biology of reptiles presents unique barriers for the use of this technology, and there are currently no reptiles with effective methods for targeted mutagenesis. Here we present a new approach that enables the efficient production of CRISPR-cas induced mutations in Anolis lizards, an important model for studies of reptile evolution and development.

12: A guide to performing Polygenic Risk Score analyses
more details view paper

Posted to bioRxiv 14 Sep 2018

A guide to performing Polygenic Risk Score analyses
2,806 downloads genomics

Shing Wan Choi, Timothy Mak, Paul F O'Reilly

The application of polygenic risk scores (PRS) has become routine in genetic epidemiological studies. Among a range of applications, PRS are commonly used to assess shared aetiology among different phenotypes and to evaluate the predictive power of genetic data, while they are also now being exploited as part of study design, in which experiments are performed on individuals, or their biological samples (eg. tissues, cells), at the tails of the PRS distribution and contrasted. As GWAS sample sizes increase and PRS become more powerful, they are also set to play a key role in personalised medicine. Despite their growing application and importance, there are limited guidelines for performing PRS analyses, which can lead to inconsistency between studies and misinterpretation of results. Here we provide detailed guidelines for performing polygenic risk score analyses relevant to different methods for their calculation, outlining standard quality control steps and offering recommendations for best-practice. We also discuss different methods for the calculation of PRS, common misconceptions regarding the interpretation of results and future challenges.

13: Toxicity of JUUL Fluids and Aerosols Correlates Strongly with Nicotine and Some Flavor Chemical Concentrations
more details view paper

Posted to bioRxiv 09 Dec 2018

Toxicity of JUUL Fluids and Aerosols Correlates Strongly with Nicotine and Some Flavor Chemical Concentrations
2,745 downloads pharmacology and toxicology

Esther Omaiye, Kevin J McWhirter, Wentai Luo, James F Pankow, Prue Talbot

While JUUL electronic cigarettes (ECs) have captured the majority of the EC market with a large fraction of their sales going to adolescents, little is known about their cytotoxicity and potential effects on health. The purpose of this study was to determine flavor chemical and nicotine concentrations in the eight currently marketed pre-filled JUUL EC cartridges (pods) and to evaluate the cytotoxicity of the different variants (e.g., Cool Mint and Creme Brulee) using in vitro assays. Nicotine and flavor chemicals were analyzed using gas chromatography/mass spectrometry in pod fluid before and after vaping and in the corresponding aerosols. 59 flavor chemicals were identified in JUUL pod fluids, and three were >1 mg/mL. Duplicate pods were similar in flavor chemical composition and concentration. Nicotine concentrations (average 60.9 mg/mL) were significantly higher than any EC products we have analyzed previously. Transfer efficiency of individual flavor chemicals that were >1mg/mL and nicotine from the pod fluid into aerosols was generally 35 - 80%. All pod fluids were cytotoxic at a 1:10 dilution (10%) in the MTT and neutral red uptake assays when tested with BEAS-2B lung epithelial cells. Most aerosols were cytotoxic in these assays at concentrations >1%. The cytotoxicity of aerosols was highly correlated with nicotine and ethyl maltol concentrations and moderately to weakly correlated with total flavor chemical concentration and menthol concentration. Our study demonstrates that: (1) some JUUL flavor pods have high concentrations of flavor chemicals that may make them attractive to youth, and (2) the concentrations of nicotine and some flavor chemicals (e.g. ethyl maltol) are high enough to be cytotoxic in acute in vitro assays, emphasizing the need to determine if JUUL products will lead to adverse health effects with chronic use.

14: Using DeepLabCut for 3D markerless pose estimation across species and behaviors
more details view paper

Posted to bioRxiv 24 Nov 2018

Using DeepLabCut for 3D markerless pose estimation across species and behaviors
2,733 downloads neuroscience

Tanmay Nath, Alexander Mathis, An Chi Chen, Amir Patel, Matthias Bethge, Mackenzie W. Mathis

Noninvasive behavioral tracking of animals during experiments is crucial to many scientific pursuits. Extracting the poses of animals without using markers is often essential for measuring behavioral effects in biomechanics, genetics, ethology & neuroscience. Yet, extracting detailed poses without markers in dynamically changing backgrounds has been challenging. We recently introduced an open source toolbox called DeepLabCut that builds on a state-of-the-art human pose estimation algorithm to allow a user to train a deep neural network using limited training data to precisely track user-defined features that matches human labeling accuracy. Here, with this paper we provide an updated toolbox that is self contained within a Python package that includes new features such as graphical user interfaces and active-learning based network refinement. Lastly, we provide a step-by-step guide for using DeepLabCut.

15: Automated analysis of whole brain vasculature using machine learning
more details view paper

Posted to bioRxiv 18 Apr 2019

Automated analysis of whole brain vasculature using machine learning
2,595 downloads neuroscience

Mihail Ivilinov Todorov, Johannes C. Paetzold, Oliver Schoppe, Giles Tetteh, Velizar Efremov, Katalin Voelgyi, Marco Duering, Martin Dichgans, Marie Piraud, Bjoern Menze, Ali Erturk

Tissue clearing methods enable imaging of intact biological specimens without sectioning. However, reliable and scalable analysis of such large imaging data in 3D remains a challenge. Towards this goal, we developed a deep learning-based framework to quantify and analyze the brain vasculature, named Vessel Segmentation & Analysis Pipeline (VesSAP). Our pipeline uses a fully convolutional network with a transfer learning approach for segmentation. We systematically analyzed vascular features of the whole brains including their length, bifurcation points and radius at the micrometer scale by registering them to the Allen mouse brain atlas. We reported the first evidence of secondary intracranial collateral vascularization in CD1-Elite mice and found reduced vascularization in the brainstem as compared to the cerebrum. VesSAP thus enables unbiased and scalable quantifications for the angioarchitecture of the cleared intact mouse brain and yields new biological insights related to the vascular brain function.

16: Droplet-based combinatorial indexing for massive scale single-cell epigenomics
more details view paper

Posted to bioRxiv 18 Apr 2019

Droplet-based combinatorial indexing for massive scale single-cell epigenomics
2,467 downloads genomics

Caleb Lareau, Fabiana M Duarte, Jennifer G Chew, Vinay K Kartha, Zachary D Burkett, Andrew S Kohlway, Dmitry Pokholok, Martin J Aryee, Frank J Steemers, Ronald Lebofsky, Jason Daniel Buenrostro

While recent technical advancements have facilitated the mapping of epigenomes at single-cell resolution, the throughput and quality of these methods have limited the widespread adoption of these technologies. Here, we describe a droplet microfluidics platform for single-cell assay for transposase accessible chromatin (scATAC-seq) for high-throughput single-cell profiling of chromatin accessibility. We use this approach for the unbiased discovery of cell types and regulatory elements within the mouse brain. Further, we extend the throughput of this approach by pairing combinatorial indexing with droplet microfluidics, enabling single-cell studies at a massive scale. With this approach, we measure chromatin accessibility across resting and stimulated human bone marrow derived cells to reveal changes in the cis- and trans-regulatory landscape across cell types and upon stimulation conditions at single-cell resolution. Altogether, we describe a total of 502,207 single-cell profiles, demonstrating the scalability and flexibility of this droplet-based platform.

17: Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
more details view paper

Posted to bioRxiv 14 Mar 2019

Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
2,320 downloads genomics

Christoph Hafemeister, Rahul Satija

Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from 'regularized negative binomial regression', where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. Importantly, we show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates. Our procedure omits the need for heuristic steps including pseudocount addition or log-transformation, and improves common downstream analytical tasks such as variable gene selection, dimensional reduction, and differential expression. Our approach can be applied to any UMI-based scRNA-seq dataset and is freely available as part of the R package sctransform (https://github.com/ChristophH/sctransform), with a direct interface to our single-cell toolkit Seurat.

18: Genetic Associations with Mathematics Tracking and Persistence in Secondary School
more details view paper

Posted to bioRxiv 05 Apr 2019

Genetic Associations with Mathematics Tracking and Persistence in Secondary School
2,292 downloads genetics

Kathryn Paige Harden, Benjamin W Domingue, Daniel W Belsky, Jason Boardman, Robert Crosnoe, Margherita Malanchini, Michel G. Nivard, Elliot M Tucker-Drob, Kathleen Mullan Harris

Maximizing the flow of students through the science, technology, engineering, and math (STEM) pipeline is important to promoting human capital development and reducing economic inequality. A critical juncture in the STEM pipeline is the highly-cumulative sequence of secondary school math courses. Students from disadvantaged schools are less likely to complete advanced math courses, but debate continues about why. Here, we address this question using student polygenic scores, which are DNA-based indicators of propensity to succeed in education. We integrated genetic and official school transcript data from over 3,000 European-ancestry students from U.S. high schools. We used polygenic scores as a molecular tracer to understand how the flow of students through the high school math pipeline differs in socioeconomically advantaged versus disadvantaged schools. Students with higher education polygenic scores were tracked to more advanced math already at the beginning of high school and persisted in math for more years. Molecular tracer analyses revealed that the dynamics of the math pipeline differed by school advantage. Compared to disadvantaged schools, advantaged schools tracked more students with high polygenic scores into advanced math classes at the start of high school, and they buffered students with low polygenic scores from dropping out of math. Across all schools, even students with exceptional polygenic scores (top 2%) were unlikely to take the most advanced math classes, suggesting substantial room for improvement in the development of potential STEM talent. These results link new molecular genetic discoveries to a common target of educational-policy reforms.

19: Gamblers: an Antibiotic-induced Evolvable Cell Subpopulation Differentiated by Reactive-oxygen-induced General Stress Response
more details view paper

Posted to bioRxiv 11 Dec 2018

Gamblers: an Antibiotic-induced Evolvable Cell Subpopulation Differentiated by Reactive-oxygen-induced General Stress Response
2,205 downloads microbiology

John P Pribis, Libertad García-Villada, Yin Zhai, Ohad Lewin-Epstein, Anthony Wang, Jingjing Liu, Jun Xia, Qian Mei, Devon M Fitzgerald, Julia Bos, Robert Austin, Christophe Herman, David Bates, Lilach Hadany, P.J. Hastings, Susan M Rosenberg

Antibiotics can induce mutations that cause antibiotic resistance. Yet, despite their importance, mechanisms of antibiotic-promoted mutagenesis remain elusive. We report that the fluoroquinolone antibiotic ciprofloxacin (cipro) induces mutations that cause drug resistance by triggering differentiation of a mutant-generating cell subpopulation, using reactive oxygen species (ROS) to signal the sigma-S (σS) general-stress response. Cipro-generated DNA breaks activate the SOS DNA-damage response and error-prone DNA polymerases in all cells. However, mutagenesis is restricted to a cell subpopulation in which electron transfer and SOS induce ROS, which activate the σS response, allowing mutagenesis during DNA-break repair. When sorted, this small σS-response-'on' subpopulation produces most antibiotic cross-resistant mutants. An FDA-approved drug prevents σS induction specifically inhibiting antibiotic-promoted mutagenesis. Furthermore, SOS-inhibited cell division, causing multi-chromosome cells, is required for mutagenesis. The data support a model in which within-cell chromosome cooperation together with development of a 'gambler' cell subpopulation promote resistance evolution without risking most cells.

20: Population imaging of neural activity in awake behaving mice in multiple brain regions
more details view paper

Posted to bioRxiv 23 Apr 2019

Population imaging of neural activity in awake behaving mice in multiple brain regions
2,164 downloads neuroscience

Kiryl D. Piatkevich, Seth Bensussen, Hua-an Tseng, Sanaya N. Shroff, Violetta Giselle Lopez-Huerta, Demian Park, Erica E. Jung, Or A. Shemesh, Christoph Straub, Howard J Gritton, Michael F. Romano, Emma Costa, Bernardo L. Sabatini, Zhanyan Fu, Edward S Boyden, Xue Han

A longstanding goal in neuroscience has been to image membrane voltage, with high temporal precision and sensitivity, in awake behaving mammals. Here, we report a genetically encoded voltage indicator, SomArchon, which exhibits millisecond response times and compatibility with optogenetic control, and which increases the sensitivity, signal-to-noise ratio, and number of neurons observable, by manyfold over previous reagents. SomArchon only requires conventional one-photon microscopy to achieve these high performance characteristics. These improvements enable population analysis of neural activity, both at the subthreshold and spiking levels, in multiple brain regions: cortex, hippocampus, and striatum of awake behaving mice. Using SomArchon, we detect both positive and negative responses of striatal neurons during movement, highlighting the power of voltage imaging to reveal bidirectional modulation. We also examine how the intracellular subthreshold theta oscillations of hippocampal neurons govern spike output, finding that nearby cells can exhibit highly correlated subthreshold activities, even as they generate highly divergent spiking patterns.

Previous page 1 2 3 4 5 . . . 2443 Next page

Sign up for the Rxivist weekly newsletter! (Click here for more details.)