Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 57,506 bioRxiv papers from 264,779 authors.
Most downloaded bioRxiv papers, since beginning of last month
56,052 results found. For more information, click each entry to expand.
900 downloads genetics
Endre Neparaczki, Zoltan Maroti, Tibor Kalmar, Kitti Maar, Istvan Nagy, Dora Latinovics, Agnes Kustar, Gyorgy Palfi, Erika Molnar, Antonia Marcsik, Csilla Balogh, Gabor Lorinczy, Szilard Gal, Peter Tomka, Bernadett Kovacsoczy, Laszlo Kovacs, Istvan Rasko, Tibor Torok
Hun, Avar and conquering Hungarian nomadic groups arrived into the Carpathian Basin from the Eurasian Steppes and significantly influenced its political and ethnical landscape. In order to shed light on the genetic affinity of above groups we have determined Y chromosomal haplogroups and autosomal loci, from 49 individuals, supposed to represent military leaders. Haplogroups from the Hun-age are consistent with Xiongnu ancestry of European Huns. Most of the Avar-age individuals carry east Eurasian Y haplogroups typical for modern north-eastern Siberian and Buryat populations and their autosomal loci indicate mostly unmixed Asian characteristics. In contrast the conquering Hungarians seem to be a recently assembled population incorporating pure European, Asian and admixed components. Their heterogeneous paternal and maternal lineages indicate similar phylogeographic origin of males and females, derived from Central-Inner Asian and European Pontic Steppe sources. Composition of conquering Hungarian paternal lineages is very similar to that of Baskhirs, supporting historical sources that report identity of the two groups.
898 downloads bioinformatics
The rapid development of novel spatial transcriptomics technologies has provided new opportunities to investigate the interactions between cells and their native microenvironment. However, effective use of such technologies requires the development of innovative computational algorithms and pipelines. Here we present Giotto, a comprehensive, flexible, robust, and open-source pipeline for spatial transcriptomic data analysis and visualization. The data analysis module implements a wide range of algorithms ranging from basic tasks such as data pre-processing to innovative approaches for cell-cell interaction characterization. The data visualization module provides a user-friendly workspace that allows users to interactively visualize, explore and compare multiple layers of information. These two modules can be used iteratively for refined analysis and hypothesis development. We illustrate the functionalities of Giotto by using the recently published seqFISH+ dataset for mouse brain. Our analysis highlights the utility of Giotto for characterizing tissue spatial organization as well as for the interactive exploration of multi- layer information in spatial transcriptomic and imaging data. We find that single-cell resolution spatial information is essential for the investigation of ligand-receptor mediated cell-cell interactions. Giotto is generally applicable and can be easily integrated with external software packages for multi-omic data integration.
895 downloads microbiology
David C. Danko, Daniela Bezdan, Ebrahim Afshinnekoo, Sofia Ahsanuddin, Josue Alicea, Chandrima Bhattacharya, Malay Bhattacharyya, Ran Blekhman, Daniel J Butler, Eduardo Castro-Nallar, Ana M Canas, Aspassia D Chatziefthimiou, Kern Rei Chng, David A Coil, Denise Syndercombe Court, Robert W Crawford, Christelle Desnues, Emmanuel Dias-Neto, Daisy Donnellan, Marius Dybwad, Jonathan A. Eisen, Eran Elhaik, Danilo Ercolini, Francesca De Filippis, Alina Frolova, Alexandra B Graf, David C Green, Patrick K. H. Lee, Jochen Hecht, Mark Hernandez, Soojin Jang, Andre Kahles, Mikhail Karasikov, Kaymisha Knights, Nikos C. Kyrpides, Per Ljungdahl, Abigail Lyons, Gabriella Mason-Buck, Ken McGrath, Emmanuel F Mongodin, Harun Mustafa, Beth Mutai, Niranjan Nagarajan, Russell Y Neches, Amanda Ng, Marina Nieto-Caballero, Olga Nikolayeva, Tatyana Nikolayeva, Houtan Noushmehr, Manuela Oliveira, Stephan Ossowski, Olayinka O Osuolale, David Paez-Espino, Eileen Png, Nicolas Rascovan, Hugues Richard, Gunnar Ratsch, Jorge L Sanchez, Lynn M Schriml, Heba Shaaban, Leming Shi, Maria A Sierra, Le Huu Song, Haruo Suzuki, Dominique Thomas, Klas I Udekwu, Juan A. Ugalde, Brandon Valentine, Dimitar I Vassilev, Elena Vayndorf, Marcus H Y Leung, Ben Young, Maria M Zambrano, Jifeng Zhu, Sibo Zhu, Pawel P Labaj, Christopher E Mason
Although studies have shown that urban environments and mass-transit systems have geospatially distinct metagenomes, no study has ever systematically studied these dense, human/microbial ecosystems around the world. To address this gap in knowledge, we created a global metagenomic and antimicrobial resistance (AMR) atlas of urban mass transit systems from 58 cities, spanning 3,741 samples and 4,424 taxonomically-defined microorganisms collected for three years. The map provides annotated, geospatial data about microbial strains, functional genetics, antimicrobial resistance, and novel genetic elements, including 10,928 novel predicted viral species. Urban microbiomes often resemble human commensal microbiomes from the skin and airways but contain a consistent "core" of 61 species which are predominantly not human commensal species. These data also show that AMR density across cities varies by several orders of magnitude with many AMRs present on plasmids with cosmopolitan distributions. Conversely, samples may be accurately (91.4%) classified to their city-of-origin using a linear support vector machine over taxa. Together, these results constitute a high-resolution global metagenomic atlas, which enables the discovery of new genetic components of the built human environment, forensic application, and an essential first draft of the global AMR burden of the world's cities
893 downloads biophysics
Bacteria have evolved adaptive immune systems encoded by Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and the CRISPR-associated (Cas) genes to maintain genomic integrity in the face of relentless assault from pathogens and mobile genetic elements. Type I CRISPR-Cas systems canonically target foreign DNA for degradation via the joint action of the ribonucleoprotein complex Cascade and the helicase-nuclease Cas3, but nuclease-deficient Type I systems lacking Cas3 have been repurposed for RNA-guided transposition by bacterial Tn7-like transposons. How CRISPR -and transposon-associated machineries collaborate during DNA targeting and insertion has remained elusive. Here we determined structures of a novel TniQ-Cascade complex encoded by the Vibrio cholerae Tn6677 transposon using single particle electron cryo-microscopy (cryo-EM), revealing the mechanistic basis of this functional coupling. The quality of the cryo-EM maps allowed for de novo modeling and refinement of the transposition protein TniQ, which binds to the Cascade complex as a dimer in a head-to-tail configuration, at the interface formed by Cas6 and Cas7 near the 3' end of the crRNA. The natural Cas8-Cas5 fusion protein binds the 5' crRNA handle and contacts the TniQ dimer via a flexible insertion domain. A target DNA-bound structure reveals critical interactions necessary for protospacer adjacent motif (PAM) recognition and R-loop formation. The present work lays the foundation for a structural understanding of how DNA targeting by TniQ-Cascade leads to downstream recruitment of additional transposon-associated proteins, and will guide protein engineering efforts to leverage this system for programmable DNA insertions in genome engineering applications.
892 downloads genomics
Cristopher V Van Hout, Ioanna Tachmazidou, Joshua D Backman, Joshua X Hoffman, Bin Yi, Ashutosh Pandey, Claudia Gonzaga-Jauregui, Shareef Khalid, Daren Liu, Nilanjana Banerjee, Alexander H Li, Colm O'Dushlaine, Anthony Marcketta, Jeffrey Staples, Claudia Schumann, Alicia Hawes, Evan Maxwell, Leland Barnard, Alexander Lopez, John Penn, Lukas Habegger, Andrew L Blumenfeld, Ashish Yadav, Kavita Praveen, Marcus Jones, William J Salerno, Wendy K Chung, Ida Surakka, Cristen J. Willer, Kristian Hveem, Joseph B Leader, David J Carey, David H Ledbetter, Lon Cardon, George D Yancopoulos, Aris Economides, Giovanni Coppola, Alan R Shuldiner, Suganthi Balasubramanian, Michael Cantor, Matthew R. Nelson, John C Whittaker, Jeffrey G Reid, Jonathan Marchini, John D Overton, Robert A Scott, Goncalo Abecasis, Laura M Yerges-Armstrong, Aris Baras
The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world. Here we describe the first tranche of large-scale exome sequence data for 49,960 study participants, revealing approximately 4 million coding variants (of which ~98.4% have frequency < 1%). The data includes 231,631 predicted loss of function variants, a >10-fold increase compared to imputed sequence for the same participants. Nearly all genes (>97%) had ≥1 predicted loss of function carrier, and most genes (>69%) had ≥10 loss of function carriers. We illustrate the power of characterizing loss of function variation in this large population through association analyses across 1,741 phenotypes. In addition to replicating a range of established associations, we discover novel loss of function variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical significance in this population, finding that 2% of the population has a medically actionable variant. Additionally, we leverage the phenotypic data to characterize the relationship between rare BRCA1 and BRCA2 pathogenic variants and cancer risk. Exomes from the first 49,960 participants are now made accessible to the scientific community and highlight the promise offered by genomic sequencing in large-scale population-based studies.
887 downloads bioinformatics
Taxonomic classification is a crucial step for metagenomics applications including disease diagnostics, microbiome analyses, and outbreak tracing. Yet it is unknown what deep learning architecture can capture microbial genome-wide features relevant to this task. We report DeepMicrobes (https://github.com/MicrobeLab/DeepMicrobes), a computational framework that can perform large-scale training on > 10,000 RefSeq complete microbial genomes and accurately predict the species-of-origin of whole metagenome shotgun sequencing reads. We show the advantage of DeepMicrobes over state-of-the-art tools in precisely identifying species from microbial community sequencing data. Therefore, DeepMicrobes expands the toolbox of taxonomic classification for metagenomics and enables the development of further deep learning-based bioinformatics algorithms for microbial genomic sequence analysis.
884 downloads bioinformatics
De novo genome assembly provides comprehensive, unbiased genomic information and makes it possible to gain insight into new DNA sequences not present in reference genomes. Many de novo human genomes have been published in the last few years, leveraging a combination of inexpensive short-read and single-molecule long-read technologies. As long-read DNA sequencers become more prevalent, the computational burden of generating assemblies persists as a critical factor. The most common approach to long-read assembly, using an overlap-layout-consensus (OLC) paradigm, requires all-to-all read comparisons, which quadratically scales in computational complexity with the number of reads. We assert that recently achievements in sequencing technology (i.e. with accuracy ~99% and read length ~10-15k) enables a fundamentally better strategy for OLC that is effectively linear rather than quadratic. Our genome assembly implementation, Peregrine uses sparse hierarchical minimizers (SHIMMER) to index reads thereby avoiding the need for an all-to-all read comparison step. Peregrine can assemble 30x human PacBio CCS read datasets in less than 30 CPU hours and around 100 wall-clock minutes to a high contiguity assembly (N50 > 20Mb). The continued advance of sequencing technologies coupled with the Peregrine assembler enables routine generation of human de novo assemblies. This will allow for population scale measurements of more comprehensive genomic variations -- beyond SNPs and small indels -- as well as novel applications requiring rapid access to de novo assemblies.
880 downloads genomics
Davis McCarthy, Raghd Rostom, Yuanhua Huang, Daniel J Kunz, Petr Danecek, Marc Jan Bonder, Tzachi Hagai, HipSci Consortium, Wenyi Wang, Daniel J Gaffney, Benjamin D Simons, Oliver Stegle, Sarah A Teichmann
Decoding the clonal substructures of somatic tissues sheds light on cell growth, development and differentiation in health, ageing and disease. DNA-sequencing, either using bulk or using single-cell assays, has enabled the reconstruction of clonal trees from frequency and co-occurrence patterns of somatic variants. However, approaches to systematically characterize phenotypic and functional variations between individual clones are not established. Here we present cardelino (https://github.com/PMBio/cardelino), a computational method for inferring the clone of origin of individual cells that have been assayed using single-cell RNA-seq (scRNA-seq). After validating our model using simulations, we apply cardelino to matched scRNA-seq and exome sequencing data from 32 human dermal fibroblast lines, identifying hundreds of differentially expressed genes between cells from different somatic clones. These genes are frequently enriched for cell cycle and proliferation pathways, indicating a key role for cell division genes in non-neutral somatic evolution.
876 downloads synthetic biology
In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation. In biology, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Learning the natural distribution of evolutionary protein sequence variation is a logical step toward predictive and generative modeling for biology. To this end we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million sequences spanning evolutionary diversity. The resulting model maps raw sequences to representations of biological properties without labels or prior domain knowledge. The learned representation space organizes sequences at multiple levels of biological granularity from the biochemical to proteomic levels. Learning recovers information about protein structure: secondary structure and residue-residue contacts can be extracted by linear projections from learned representations. With small amounts of labeled data, the ability to identify tertiary contacts is further improved. Learning on full sequence diversity rather than individual protein families increases recoverable information about secondary structure. We show the networks generalize by adapting them to variant activity prediction from sequences only, with results that are comparable to a state-of-the-art variant predictor that uses evolutionary and structurally derived features.
874 downloads developmental biology
During development, forces transmitted between cells are critical for sculpting epithelial tissues. Actomyosin contractility in the middle of the cell apex (medioapical) can change cell shape (e.g., apical constriction), but can also result in force transmission between cells via attachments to adherens junctions. How actomyosin networks maintain attachments to adherens junctions under tension is poorly understood. Here, we discovered that microtubules promote actomyosin intercellular attachments in epithelia during Drosophila mesoderm invagination. First, we used live imaging to show a novel arrangement of the microtubule cytoskeleton during apical constriction: medioapical Patronin (CAMSAP) foci formed by actomyosin contraction organized an apical non-centrosomal microtubule network. Microtubules were required for mesoderm invagination but were not necessary for initiating apical contractility or adherens junction assembly. Instead, microtubules promoted connections between medioapical actomyosin and adherens junctions. These results delineate a role for coordination between actin and microtubule cytoskeletal systems in intercellular force transmission during tissue morphogenesis.
874 downloads biochemistry
Stress granules are condensates of non-translating mRNAs and proteins involved in the stress response and neurodegenerative diseases. Stress granules are proposed to form in part through intermolecular RNA-RNA interactions, although the process of RNA condensation is not well understood. In vitro , we demonstrate that the minimization of surface free energy promotes the recruitment and interaction of RNAs on RNA or RNP condensate surfaces. We demonstrate that the ATPase activity of the DEAD-box RNA helicase eIF4A reduces RNA recruitment to RNA condensates in vitro and in cells, as well as limiting stress granule formation. This defines a new function for eIF4A, and potentially other RNA helicases, to limit thermodynamically favored intermolecular RNA-RNA interactions in cells, thereby allowing for proper RNP function. Highlights
872 downloads genetics
Defining the effects that rare variants can have on human phenotypes is essential to advancing our understanding of human health and disease. Large-scale human genetic analyses have thus far focused on common variants, but the development of large cohorts of deeply phenotyped individuals with exome sequence data has now made comprehensive analyses of rare variants possible. We analyzed the effects of rare (MAF<0.1%) variants on 3,166 phenotypes in 40,468 exome-sequenced individuals from the UK Biobank and performed replication as well as meta-analyses with 1,067 phenotypes in 13,470 members of the Healthy Nevada Project (HNP) cohort who underwent Exome+ sequencing at Helix. Our analyses of non-benign coding and loss of function (LoF) variants identified 78 gene-based associations that passed our statistical significance threshold (p<5x10-9). These are associations in which carrying any rare coding or LoF variant in the gene is associated with an enrichment for a specific phenotype, as opposed to GWAS-based associations of strictly single variants. Importantly, our results do not suffer from the test statistic inflation that is often seen with rare variant analyses of biobank-scale data because of our rare variant-tailored methodology, which includes a step that optimizes the carrier frequency threshold for each phenotype based on prevalence. Of the 47 discovery associations whose phenotypes were represented in the replication cohort, 98% showed effects in the expected direction, and 45% attained formal replication significance (p<0.001). Six additional significant associations were identified in our meta-analysis of both cohorts. Among the results, we confirm known associations of PCSK9 and APOB variation with LDL levels; we extend knowledge of variation in the TYRP1 gene, previously associated with blonde hair color only in Solomon Islanders to blonde hair color in individuals of European ancestry; we show that PAPPA, a gene in which common variants had previously associated with height via GWAS, contains rare variants that decrease height; and we make the novel discovery that STAB1 variation is associated with blood flow in the brain. Our results are available for download and interactive browsing in an app (https://ukb.research.helix.com). This comprehensive analysis of the effects of rare variants on human phenotypes marks one of the first steps in the next big phase of human genetics, where large, deeply phenotyped cohorts with next generation sequence data will elucidate the effects of rare variants.
870 downloads bioinformatics
Single-cell RNA sequencing (scRNA-seq) has quickly become an empowering technology to profile the transcriptomes of individual cells on a large scale. Many early analyses of differential expression have aimed at identifying differences between subpopulations, and thus are focused on finding markers for cell populations either in a single sample or across multiple samples. More generally, such methods can compare expression levels in multiple sets of cells, thus leading to cross-condition analyses. However, given the emergence of replicated multi-condition scRNA-seq datasets, an area of increasing focus is making sample-level inferences, termed here as differential state analysis. For example, one could investigate the condition-specific responses of cell populations measured from patients from each condition; however, it is not clear which statistical framework best handles this situation. In this work, we surveyed the methods available to perform cross-condition differential state analyses, including cell-level mixed models and methods based on aggregated pseudobulk data. We developed a flexible simulation platform that mimics both single and multi-sample scRNA-seq data and provide robust tools for multi-condition analysis within the muscat R package.
867 downloads scientific communication and education
Using an online survey of academics at 55 randomly selected institutions across the US and Canada, we explore priorities for publishing decisions and their perceived importance within review, promotion, and tenure (RPT). We find that respondents most value journal readership, while they believe their peers most value prestige and related metrics such as impact factor when submitting their work for publication. Respondents indicated that total number of publications, number of publications per year, and journal name recognition were the most valued factors in RPT. Older and tenured respondents (most likely to serve on RPT committees) were less likely to value journal prestige and metrics for publishing, while untenured respondents were more likely to value these factors. These results suggest disconnects between what academics value versus what they think their peers value, and between the importance of journal prestige and metrics for tenured versus untenured faculty in publishing and RPT perceptions.
862 downloads neuroscience
The neuronal protein Arc is a critical mediator of synaptic plasticity. Arc originated in tetrapods and flies through domestication of retrotransposon Gag genes. Recent studies have suggested that Arc mediates intercellular mRNA transfer, and, like Gag, can form capsid-like structures. Here we report that drosophila proteins dArc1 and dArc2 assemble virus-like capsids. We determine the capsid structures to 2.8 Å and 3.7 Å resolution, respectively, finding similarity to capsids of retroviruses and retrotransposons. Differences between dArc1 and dArc2 capsids, including the presence of a structured zinc-finger pair in dArc1, are consistent with differential RNA binding specificity. Our data support a model in which ancestral capsid-forming and RNA-binding properties of Arc remain under positive selection pressure and have been repurposed to function in neuronal signalling.
862 downloads neuroscience
Fenna M Krienen, Melissa Goldman, Qiangge Zhang, Ricardo del Rosario, Marta Florio, Robert Machold, Arpiar Saunders, Kirsten Levandowski, Heather Zaniewski, Benjamin Schuman, Carolyn Wu, Alyssa Lutservitz, Christopher D Mullally, Nora Reed, Elizabeth Bien, Laura Bortolin, Marian Fernandez-Otero, Jessica Lin, Alec Wysoker, James Nemesh, David Kulp, Monika Burns, Victor Tkachev, Richard Smith, Chistopher A. Walsh, Jordane Dimidschstein, Bernardo Rudy, Leslie Kean, Sabina Berretta, Gordon Fishell, Guoping Feng, Steven A McCarroll
Primates and rodents, which descended from a common ancestor more than 90 million years ago, exhibit profound differences in behavior and cognitive capacity. Modifications, specializations, and innovations to brain cell types may have occurred along each lineage. We used Drop-seq to profile RNA expression in more than 184,000 individual telencephalic interneurons from humans, macaques, marmosets, and mice. Conserved interneuron types varied significantly in abundance and RNA expression between mice and primates, but varied much more modestly among primates. In adult primates, the expression patterns of dozens of genes exhibited spatial expression gradients among neocortical interneurons, suggesting that adult neocortical interneurons are imprinted by their local cortical context. In addition, we found that an interneuron type previously associated with the mouse hippocampus--the "ivy cell", which has neurogliaform characteristics--has become abundant across the neocortex of humans, macaques, and marmosets. The most striking innovation was subcortical: we identified an abundant striatal interneuron type in primates that had no molecularly homologous cell population in mouse striatum, cortex, thalamus, or hippocampus. These interneurons, which expressed a unique combination of transcription factors, receptors, and neuropeptides, including the neuropeptide TAC3, constituted almost 30% of striatal interneurons in marmosets and humans. Understanding how gene and cell-type attributes changed or persisted over the evolutionary divergence of primates and rodents will guide the choice of models for human brain disorders and mutations and help to identify the cellular substrates of expanded cognition in humans and other primates.
861 downloads plant biology
The molecular codes underpinning the functions of plant NLR immune receptors are poorly understood. We used in vitro Mu transposition to generate a random truncation library and identify the minimal functional region of NLRs. We applied this method to NRC4, a helper NLR that functions with multiple sensor NLRs within a Solanaceae receptor network. This revealed that the NRC4 N-terminal 29 amino acids are sufficient to induce hypersensitive cell death. This region is defined by the consensus MADAxVSFxVxKLxxLLxxEx (MADA motif) that is conserved at the N-termini of NRC family proteins and ~20% of coiled-coil (CC)-type plant NLRs. The MADA motif matches the N-terminal α1 helix of Arabidopsis NLR protein ZAR1, which undergoes a conformational switch during resistosome activation. Immunoassays revealed that the MADA motif is functionally conserved across NLRs from distantly related plant species. NRC-dependent sensor NLRs lack MADA sequences indicating that this motif has degenerated in sensor NLRs over evolutionary time.
836 downloads clinical trials
Background: Evaluation of efficacy, safety and feasibility of hyperthermic baths (HTB; head-out-of-water-immersion in 40°C), twice a week, compared to a physical exercise program (PEP; moderate intensity aerobic exercises) in moderate to severe depression. Method: Single-site, open-label randomized controlled 8-week parallel-group pilot study at an university outpatient clinic as part of usual depression care. Medically stable outpatients with depressive disorder (ICD-10: F32/F33) as determined by the 17-item Hamilton Depression Rating Scale (HAM-D) score ≥18 and a score ≥2 on item 1 (Depressed Mood) were randomly assigned to receive either two sessions of HTB or PEP per week (40-45 min) provided by two trained doctoral students. An independent biometric center used computer-generated tables to allocate treatments. Primary outcome measure was the change in HAM-D total score from baseline (T0) to the 2-week time point (T1). Linear regression analyses, adjusted for baseline values, were performed to estimate intervention effects on an intention-to-treat (ITT) principle. Findings: 45 patients (HTB n = 22; PEP n = 23) were randomized and analyzed according to ITT (mean age = 48.4 years, SD = 11.3, mean HAM-D score = 21.7, SD = 3.2). Baseline-adjusted mean difference was 4.3 points in the HAM-D score in favor of HTB (p<0.001). This improvement was achieved after two weeks. Compliance with the intervention and follow-up was far better in the HTB group (2 vs 13 dropouts). There were no treatment-related serious adverse events. Main limitation: the number of dropouts in the PEP group (13 of 23) was far higher than in other trials investigating exercise in depression (18.1 % dropouts). Conclusions: HTB seems to be a fast-acting, safe and easy accessible method leading to clinically relevant improvement in depressive disorder after two weeks; it is also suitable for persons who have problems performing exercise training. Clinical Trial registration ID #DRKS00011013.
822 downloads genomics
Lee et al. (hereafter "the Lee study") have recently reported that RNA-mediated somatic recombination or somatic retrotransposition of the APP gene occurs in neurons from both control individuals and those with sporadic Alzheimer's disease (AD). As evidence of somatic APP retrotransposition, the authors present various forms of APP genomic cDNA (gencDNA) in PCR-based (Sanger sequencing, SMRT sequencing) and non-PCR-based (targeted hybrid-capture sequencing, DNA in situ hybridization (DISH)) experiments. They also report greater prevalence of APP gencDNA in AD neurons compared to control neurons (69% vs 25% of neurons with at least one APP retrogene insertion on average, Fig. 5 and Extended Data Fig. 5 in the Lee study) as well as its greater diversity. We reanalyzed the APP-targeted sequencing data from the Lee study, revealing evidence that APP gencDNA originates mainly from the contamination by exogenous APP recombinant vectors, rather from true somatic retrotransposition of endogenous APP. We also present our own single-cell whole-genome sequencing (scWGS) data that show no evidence for somatic APP retrotransposition in AD neurons or in neurons from normal individuals of various ages.
822 downloads genomics
Aaron M Wenger, Paul Peluso, William J Rowell, Pi-Chuan Chang, Richard Hall, Gregory T. Concepcion, Jana Ebler, Arkarachai Fungtammasan, Alexey Kolesnikov, Nathan D Olson, Armin Toepfer, Michael Alonge, Medhat Mahmoud, Yufeng Qian, Chen-Shan Chin, Adam M Phillippy, Michael C. Schatz, Gene Myers, Mark A. DePristo, Jue Ruan, Tobias Marschall, Fritz J. Sedlazeck, Justin M Zook, Heng Li, Sergey Koren, Andrew Carroll, David R Rank, Michael W Hunkapiller
The major DNA sequencing technologies in use today produce either highly-accurate short reads or noisy long reads. We developed a protocol based on single-molecule, circular consensus sequencing (CCS) to generate highly-accurate (99.8%) long reads averaging 13.5 kb and applied it to sequence the well-characterized human HG002/NA24385. We optimized existing tools to comprehensively detect variants, achieving precision and recall above 99.91% for SNVs, 95.98% for indels, and 95.99% for structural variants. We estimate that 2,434 discordances are correctable mistakes in the high-quality Genome in a Bottle benchmark. Nearly all (99.64%) variants are phased into haplotypes, which further improves variant detection. De novo assembly produces a highly contiguous and accurate genome with contig N50 above 15 Mb and concordance of 99.998%. CCS reads match short reads for small variant detection, while enabling structural variant detection and de novo assembly at similar contiguity and markedly higher concordance than noisy long reads.
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!