Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 48,092 bioRxiv papers from 215,637 authors.
Most tweeted bioRxiv papers, last 24 hours
390 results found. For more information, click each entry to expand.
280 tweets neuroscience
The ability to read out, or decode, mental content from brain activity has significant practical and scientific implications. For example, technology that translates cortical activity into speech would be transformative for people unable to communicate as a result of neurological impairment. Decoding speech from neural activity is challenging because speaking requires extremely precise and dynamic control of multiple vocal tract articulators on the order of milliseconds. Here, we designed a neural decoder that explicitly leverages the continuous kinematic and sound representations encoded in cortical activity to generate fluent and intelligible speech. A recurrent neural network first decoded vocal tract physiological signals from direct cortical recordings, and then transformed them to acoustic speech output. Robust decoding performance was achieved with as little as 25 minutes of training data. Naive listeners were able to accurately identify these decoded sentences. Additionally, speech decoding was not only effective for audibly produced speech, but also when participants silently mimed speech. These results advance the development of speech neuroprosthetic technology to restore spoken communication in patients with disabling neurological disorders.
74 tweets bioinformatics
Long-read RNA sequencing (RNA-Seq) is promising to transcriptomics studies, however, the alignment of the reads is still a fundamental but non-trivial task due to the sequencing errors and complicated gene structures. We propose deSALT, a tailored two-pass long RNA-seq read alignment approach, which constructs graph-based alignment skeletons to sensitively infer exons, and use them to generate spliced reference sequence to produce refined alignments. deSALT addresses several difficult issues, such as small exons, serious sequencing errors and consensus spliced alignment. Benchmarks demonstrate that this approach has a better ability to produce high-quality full-length alignments, which has enormous potentials to transcriptomics studies.
66 tweets evolutionary biology
Although homologous recombination is accepted to be common in bacteria, so far it has been challenging to accurately quantify its impact on genome evolution within bacterial species. We here introduce methods that use the statistics of single-nucleotide polymorphism (SNP) splits in the core genome alignment of a set of strains to show that, for many bacterial species, recombination dominates genome evolution. Each genomic locus has been overwritten so many times by recombination that it is impossible to reconstruct the clonal phylogeny and, instead of a consensus phylogeny, the phylogeny typically changes many thousands of times along the core genome alignment. We also show how SNP splits can be used to quantify the relative rates with which different subsets of strains have recombined in the past. We find that virtually every strain has a unique pattern of recombination frequencies with other strains and that the relative rates with which different subsets of strains share SNPs follow long-tailed distributions. Our findings show that bacterial populations are neither clonal nor freely recombining, but structured such that recombination rates between different lineages vary along a continuum spanning several orders of magnitude, with a unique pattern of rates for each lineage. Thus, rather than reflecting clonal ancestry, whole genome phylogenies reflect these long-tailed distributions of recombination rates.
45 tweets microbiology
Natural competence for transformation is a primary mode of horizontal gene transfer (HGT). Competent bacteria are able to absorb free DNA from their surroundings and exchange this DNA against pieces of their own genome when sufficiently homologous. And while it is known that transformation contributes to evolution and pathogen emergence in bacteria, there are still questions regarding the general prevalence of non-degraded DNA with sufficient coding capacity. In this context, we previously showed that the naturally competent bacterium Vibrio cholerae uses its type VI secretion system (T6SS) to actively acquire DNA from non-kin neighbors under chitin-colonizing conditions. We therefore sought to further explore the role of the T6SS in acquiring DNA, the condition of the DNA released through T6SS-mediated killing versus passive cell lysis, and the extent of the transfers that occur due to these conditions. To do this, we herein measured the frequency and the extent of genetic exchanges in bacterial co-cultures on competence-inducing chitin under various DNA-acquisition conditions. We show that competent V. cholerae strains acquire DNA fragments with an average and maximum length exceeding 50 kbp and 150 kbp, respectively, and that the T6SS is of prime importance for such HGT events. Collectively, our data support the notion that the environmental lifestyle of V. cholerae fosters HGT and that the coding capacity of the exchanged genetic material is sufficient to significantly accelerate bacterial evolution.
44 tweets molecular biology
Hand-over-hand translocation is emerging as the conserved mechanism by which ATP hydrolysis drives substrate translocation within the classical clade of AAA+ proteins. However, the operating principles of the distantly related HCLR clade, which includes the important quality control protease Lon, remains poorly defined. We determined a cryo-electron microscopy structure of Y. pestis Lon trapped in the act of processing substrate. This structure revealed that sequential ATP hydrolysis and hand-over-hand substrate translocation are conserved in this AAA+ protease. However, Lon processes substrates through a distinct molecular mechanism involving structural features unique to the HCLR clade. Our findings define a previously unobserved translocation mechanism that is likely conserved across HCLR proteins and reveal how fundamentally distinct structural configurations of distantly-related AAA+ enzymes can power hand-over-hand substrate translocation.
36 tweets neuroscience
Kiryl D. Piatkevich, Seth Bensussen, Hua-an Tseng, Sanaya N. Shroff, Violetta Giselle Lopez-Huerta, Demian Park, Erica E. Jung, Or A. Shemesh, Christoph Straub, Howard J Gritton, Michael F. Romano, Emma Costa, Bernardo L. Sabatini, Zhanyan Fu, Edward S Boyden, Xue Han
A longstanding goal in neuroscience has been to image membrane voltage, with high temporal precision and sensitivity, in awake behaving mammals. Here, we report a genetically encoded voltage indicator, SomArchon, which exhibits millisecond response times and compatibility with optogenetic control, and which increases the sensitivity, signal-to-noise ratio, and number of neurons observable, by manyfold over previous reagents. SomArchon only requires conventional one-photon microscopy to achieve these high performance characteristics. These improvements enable population analysis of neural activity, both at the subthreshold and spiking levels, in multiple brain regions: cortex, hippocampus, and striatum of awake behaving mice. Using SomArchon, we detect both positive and negative responses of striatal neurons during movement, highlighting the power of voltage imaging to reveal bidirectional modulation. We also examine how the intracellular subthreshold theta oscillations of hippocampal neurons govern spike output, finding that nearby cells can exhibit highly correlated subthreshold activities, even as they generate highly divergent spiking patterns.
32 tweets microbiology
Manuel Saldivia, Srinivasa P.S. Rao, Eric Fang, Elmarie Myburgh, Elaine Brown, Adam J. M. Wollman, Ryan Ritchie, Suresh B Lakhsminarayana, Yen Liang Chen, Debjani Patra, Hazel X Koh, Sarah Williams, Frantisek Supek, Daniel Paape, Christopher Bower-Lepts, Mark C. Leake, Richard McCulloch, Marcel Kaiser, Michael P Barrett, Jan Jiricek, Thierry T. Diagana, Jeremy C Mottram
The kinetochore is a macromolecular structure that assembles on the centromeres of chromosomes and provides the major attachment point for spindle microtubules during mitosis. In Trypanosoma brucei the proteins that make up the kinetochore are highly divergent, with the inner kinetochore comprising at least 20 distinct and essential proteins (KKT1-20) that include four protein kinases, CLK1 (KKT10), CLK2 (KKT19), KKT2 and KKT3. We performed a phenotypic screen of T. brucei bloodstream forms with a Novartis kinase-focused inhibitor library, which identified a number of selective inhibitors with potent pan-kinetoplastid activity. Deconvolution of an amidobenzimidazole series using a selection of 37 T. brucei mutants that over-express known essential protein kinases identified CLK1 as the primary target. Biochemical studies show that the irreversible competitive inhibition of CLK1 is dependent on a Michael acceptor forming an irreversible bond with C215 in the ATP binding pocket, a residue that is not present in human CLK1, thereby providing selectivity. Chemical inhibition of CLK1 impairs inner kinetochore recruitment and compromises cell cycle progression, leading to cell death. We show that KKT2 is a substrate for CLK1 and identify phosphorylation of S508 to be essential for KKT2 function and for kinetochore assembly. We propose that CLK1 is part of a novel signalling cascade that controls kinetochore function via phosphorylation of the inner kinetochore protein kinase KKT2. This work highlights a novel drug target for trypanosomatid parasitic protozoa and a new chemical tool for investigating the function of their divergent kinetochores.
30 tweets genetics
Pierrick Wainschtein, Deepti P Jain, Loic Yengo, Zhili Zheng, TOPMed Anthropometry Working Group, Trans-Omics for Precision Medicine Consortium, L Adrienne Cupples, Aladdin H Shadyab, Barbara McKnight, Benjamin M Shoemaker, Braxton D Mitchell, Bruce M Psaty, Charles Kooperberg, Dan Roden, Dawood Darbar, Donna K. Arnett, Elizabeth A Regan, Eric Boerwinkle, Jerome I Rotter, Matthew A Allison, Merry-Lynn N McDonald, Mina K. Chung, Nicholas L Smith, Patrick T Ellinor, Ramachandran S Vasan, Rasika A. Mathias, Stephen S Rich, Susan R Heckbert, Susan Redline, Xiuqing Guo, Y-D Ida Chen, Ching-Ti Liu, Mariza de Andrade, Lisa R. Yanek, Christine M Albert, Ryan D. Hernandez, Stephen T McGarvey, Kari E. North, Leslie A Lange, Bruce S. Weir, Cathy C. Laurie, Jian Yang, Peter M. Visscher
Heritability, the proportion of phenotypic variance explained by genetic factors, can be estimated from pedigree data, but such estimates are uninformative with respect to the underlying genetic architecture. Analyses of data from genome-wide association studies (GWAS) on unrelated individuals have shown that for human traits and disease, approximately one-third to two-thirds of heritability is captured by common SNPs. It is not known whether the remaining heritability is due to the imperfect tagging of causal variants by common SNPs, in particular if the causal variants are rare, or other reasons such as over-estimation of heritability from pedigree data. Here we show that pedigree heritability for height and body mass index (BMI) appears to be fully recovered from whole-genome sequence (WGS) data on 21,620 unrelated individuals of European ancestry. We assigned 47.1 million genetic variants to groups based upon their minor allele frequencies (MAF) and linkage disequilibrium (LD) with variants nearby, and estimated and partitioned variation accordingly. The estimated heritability was 0.79 (SE 0.09) for height and 0.40 (SE 0.09) for BMI, consistent with pedigree estimates. Low-MAF variants in low LD with neighbouring variants were enriched for heritability, to a greater extent for protein altering variants, consistent with negative selection thereon. Cumulatively variants in the MAF range of 0.0001 to 0.1 explained 0.54 (SE 0.05) and 0.51 (SE 0.11) of heritability for height and BMI, respectively. Our results imply that the still missing heritability of complex traits and disease is accounted for by rare variants, in particular those in regions of low LD.
29 tweets genetics
Depression is the leading cause of worldwide disability but there remains considerable uncertainty regarding its neural and behavioural associations. Depression is known to be heritable with a polygenic architecture, and results from genome-wide associations studies are providing summary statistics with increasing polygenic signal that can be used to estimate genetic risk scores for prediction in independent samples. This provides a timely opportunity to identify traits that are associated with polygenic risk of depression in the large and consistently phenotyped UK Biobank sample. Using the Psychiatric Genomics Consortium (PGC), 23andMe and non-imaging UK Biobank datasets as reference samples, we estimated polygenic risk scores for depression (depression-PRS) in a discovery sample of 10,674 people and a replication sample of 11,214 people from the UK Biobank Imaging Study, testing for associations with 210 behavioural and 278 neuroimaging phenotypes. In the discovery sample, 93 traits were significantly associated with depression-PRS after multiple testing correction. Among these, 92 traits were in the same direction, and 69 were significant in the replication analysis. For imaging traits that replicated across samples, higher depression-PRS was associated with lower global white matter microstructure, association-fibre and thalamic-radiation microstructural integrity (absolute β: 0.023 to 0.040, pFDR: 0.045 to 3.92*10-4). Mendelian Randomisation analysis showed a causal effect of liability to depression on these structural brain measures (β: 0.125 to 0.707, pFDR<0.048). Replicated behavioural traits that positively associated with depression-PRS included sleep problems, smoking status, measures of pain and stressful life experiences, and those negatively associated with depression-PRS included subjective ratings of physical health (absolute β: 0.014 to 0.180, pFDR: 0.046 to 8.54*10-15). Effect of depression PRS on mental health in the presence of reported childhood trauma, stressful life events and those living in more socially deprived areas showed increased variance explained by 1.42 - 4.08 times (pFDR for their interaction with depression-PRS: 0.049 to 0.003). Overall, the present study revealed replicable associations between depression-PRS and white matter microstructure that appeared to be a causal consequence of liability to depression. Analyses provided further evidence that greater effects of polygenic risk of depression are found in individuals exposed to risk-conferring environments.
27 tweets genomics
The usability of publicly-available gene expression data is often limited by the availability of high-quality, standardized biological phenotype and experimental condition information ("metadata"). We released the recount2 project, which involved re-processing ~70,000 samples in the Sequencing Read Archive (SRA), Genotype-Tissue Expression (GTEx), and The Cancer Genome Atlas (TCGA) projects. While samples from the latter two projects are well-characterized with extensive metadata, the ~50,000 RNA-seq samples from SRA in recount2 are inconsistently annotated with metadata. Tissue type, sex, and library type can be estimated from the RNA sequencing (RNA-seq) data itself. However, more detailed and harder to predict metadata, like age and diagnosis, must ideally be provided by labs that deposit the data. To facilitate more analyses within human brain tissue data, we have complemented phenotype predictions by manually constructing a uniformly-curated database of public RNA-seq samples present in SRA and recount2. We describe the reproducible curation process for constructing recount-brain that involves systematic review of the primary manuscript, which can serve as a guide to annotate other studies and tissues. We further expanded recount-brain by merging it with GTEx and TCGA brain samples as well as linking to controlled vocabulary terms for tissue, Brodmann area and disease. Furthermore, we illustrate how to integrate the sample metadata in recount-brain with the gene expression data in recount2 to perform differential expression analysis. We then provide three analysis examples involving modeling postmortem interval, glioblastoma, and meta-analyses across GTEx and TCGA. Overall, recount-brain facilitates expression analyses and improves their reproducibility as individual researchers do not have to manually curate the sample metadata. recount-brain is available via the add_metadata() function from the recount Bioconductor package at bioconductor.org/packages/recount.
25 tweets genetics
The notion that behaviour may be on a causal path from genetics to psychiatric disorders, such as schizophrenia, highlights a potential for practical interventions. Motivated by this, we test the association between schizophrenia (SCZ) polygenic risk scores (PRS) and 420 behavioural traits (personality, psychological, lifestyle, nutritional) in a psychiatrically healthy sub-cohort of the UK Biobank. Higher schizophrenia PRS was associated with a range of traits, including lower verbal-numerical reasoning (P = 6x10-61), higher nervous feelings (P = 2x10-51) and higher self-reported risk-taking (P = 2x10-41). We follow-up the risk-taking association, hypothesising that the association may be due to a genetic propensity for risk-taking leading to greater migration, urbanicity or drug-taking − reported environmental risk factors for schizophrenia, and all positively associated with risk-taking in these data. However, schizophrenia PRS was also associated with traits, such as tea drinking (P = 2x10-34), that are highly unlikely to be on a causal path to schizophrenia. We depict four causal relationships that may in theory underlie such PRS-trait associations and illustrate ways of testing for each. For example, we contrast PRS-trait trends in the healthy sub-cohort to the corresponding trait values of medicated and non-medicated individuals diagnosed with schizophrenia, allowing some differentiation of mediation-by-behaviour, disease-onset effects and treatment effects. However, dedicated follow-up studies and new methods are required to fully disentangle these relationships. Thus, while we urge caution in interpretation of simple PRS cross-trait associations, we propose that well-designed PRS analyses can contribute to identifying behaviours on the causal path from genetics to disease.
23 tweets neuroscience
Machine learning is a powerful set of techniques that has enhanced the abilities of neuroscientists to interpret information collected through EEG, fMRI, MEG, and PET data. With these new techniques come new dangers of overfitting that are not well understood by the neuroscience community. In this article, we use Support Vector Machine (SVM) classifiers, and genetic algorithms to demonstrate the ease by which overfitting can occur, despite the use of cross validation. We demonstrate that comparable and non-generalizable results can be obtained on informative and non-informative (i.e. random) data by iteratively modifying hyperparameters in seemingly innocuous ways. We recommend a number of techniques for limiting overfitting, such as lock boxes, blind analyses, and pre-registrations. These techniques, although uncommon in neuroscience applications, are common in many other fields that use machine learning, including computer science and physics. Adopting similar safeguards is critical for ensuring the robustness of machine-learning techniques.
22 tweets plant biology
Assembling meaningful comparisons between species is a major limitation in studying the evolution of organismal form. To understand development in maize and sorghum, closely-related species with architecturally distinct inflorescences, we collected RNAseq profiles encompassing inflorescence body plan specification in both species. We reconstructed molecular ontogenies from 40 B73 maize tassels and 47 BTx623 sorghum panicles and separated them into transcriptional stages. To discover new markers of inflorescence development, we used random forest machine learning to determine stage by RNAseq. We used two descriptions of transcriptional conservation to identify hourglass-like developmental stages. Despite short evolutionary ancestry of 12 million years, we found maize and sorghum inflorescences are most different during their hourglass-like stages of development, following an 'inverse-hourglass' model of development. We discuss if agricultural selection may account for the rapid divergence signatures in these species and the observed separation of evolutionary pressure and developmental reprogramming.
20 tweets ecology
Natural microbial communities contain hundreds to thousands of interacting species. For this reason, computational simulations are playing an increasingly important role in microbial ecology. In this manuscript, we present a new open-source, freely available Python package called Community Simulator for simulating microbial population dynamics in a reproducible, transparent and scalable way. The package includes five major elements: tools for preparing the initial states and environmental conditions for a set of samples, automatic generation of dynamical equations based on a dictionary of modeling assumptions, random parameter sampling with tunable levels of metabolic and taxonomic structure, parallel integration of the dynamical equations, and support for metacommunity dynamics with migration between samples. To significantly speed up simulations using Community Simulator, our Python package implements a new Expectation-Maximization (EM) algorithm for finding equilibrium states of community dynamics that exploits a recently discovered duality between ecological dynamics and convex optimization. We present data showing that this EM algorithm improves performance by between one two orders compared to direct numerical integration of the corresponding ordinary differential equations. We conclude by discussing possible applications and extensions of the Community Simulator package.
19 tweets plant biology
Juan Alonso Serra, Xueping Shi, Alexis Peaucelle, Pasi Rastas, Matthieu Bourdon, Juha Immanen, Junko Takahashi, Hanna Koivula, Gugan Eswaran, Sampo Muranen, Hanna Help-Rinta-Rahko, Olli-Pekka Smolander, Chang Su, Omid Safronov, Lorenz Gerber, Jarkko Salojarvi, Risto Hagqvist, Ari-Pekka Mahonen, Kaisa Nieminen, Yka Helariutta
The remarkable vertical and radial growth observed in tree species, encompasses a major physical challenge for wood forming tissues. To compensate with increasing size and weight, cambium-derived radial growth increases the stem width, thereby supporting the aerial body of trees. This feedback appears to be part of a so-called 'proprioception' (1, 2) mechanism that controls plant size and biomass allocation. Yet, how trees experience or respond to mechanical stress derived from their own vertical loading, remains unknown. Here, we combined two strategies to dissect the proprioceptive response in birch. First, we show that in response to physical loading, trees promote radial growth with different magnitudes along the stem. Next, we identified a mutant cultivar (B. pubescens cv. Elimaki) in which the main stem shows normal vertical development, but collapses after three months. By inducing precocious flowering, we generated a backcrossed population (BC1) by producing two generations in 4 years. In his scheme, we uncovered a recessive trait (eki) that segregates and genetically maps with a Mendelian monogenic pattern. Unlike WT, eki is resistant to vertical mechanical stimulation. However, eki responds normally to the gravitropic stimulus by making tension wood. Before the collapse, cell size in eki is compromised resulting in radial growth defects, depending on stem height. Cell walls of developing xylem and phloem tissues have delayed differentiation in eki, and its tissues are softer compared to WT as indicated by atomic force microscopy (AFM). The transcriptomic profile of eki highlighted the overlap with that of the Arabidopsis response to touch. Taken together, our results suggest that the mechanical environment and cell wall properties of developing woody tissues, can significantly affect the growth responses to vertical loading thereby compromising their proprioceptive capacity. Additionally, we introduce a fast forward genetics strategy to dissect complex phenotypes in trees.
19 tweets genomics
Generally small effective population sizes expose island species to inbreeding and loss of genetic variation. The Raso lark has been restricted to a single islet for ~500 years, with a population size of a few hundred. To investigate the factors shaping genetic diversity in the species, we assembled a reference genome for the related Eurasian skylark and then assessed genomic diversity and demographic history using RAD-seq data (26 Raso lark samples and 52 samples from its two most closely related mainland species). Genetic diversity in the Raso lark is lower than in its mainland relatives, but is nonetheless considerably higher than anticipated given its recent population size. We found that suppressed recombination on large neo-sex chromosomes maintains divergent alleles across 13% of the genome in females, leading to a two-fold increase in overall diversity in the population. Moreover, we infer that the population contracted from a much larger size recently enough, relative to the long generation time of the Raso lark, that much of the pre-existing genetic variation persists. Nevertheless, the current small population size is likely to lead to considerable inbreeding. Overall, our findings allow for optimism about the ongoing reintroduction of Raso larks to a nearby island, but also highlight the urgency of this effort.
18 tweets neuroscience
Interacting sets of nodes and fluctuations in their interaction are important properties of a dynamic network system. In some cases the edges reflecting these interactions are directly quantifiable from the data collected, however in many cases (such as functional magnetic resonance imaging (fMRI) data) the edges must be inferred from statistical relations between the nodes. Here we present a new method, called temporal communities by trajectory clustering (TCTC), that derives time-varying communities directly from time series data collected from the nodes in a network. First, we verify TCTC on resting and task fMRI data by showing that time-averaged results correspond with expected static connectivity results. We then show that the time-varying communities correlate and predict single-trial behaviour. This new perspective on temporal community detection of node-collected data identifies robust communities revealing ongoing spatial-temporal community configurations during task performance.
18 tweets pharmacology and toxicology
Celeste B Greer, Shane Poplawski, Krassimira A Garbett, Rebekah L McMahan, Holly B Kordasiewicz, Hien Zhao, Andrew J. Kennedy, Slavina B Goleva, Teresa H Sanders, Timothy Motley, Eric E Swayze, David J Ecker, Todd P Michael, David Sweatt
The memory suppressor gene histone deacetylase 2 (Hdac2) is a target of small molecule inhibitors under investigation for their effects on cognitive enhancement and treating disorders of memory. The therapeutic compounds currently available are not completely specific to the Hdac2 isoform, and have short half-lives. Antisense oligonucleotides (ASOs) are FDA-approved to treat several diseases. They are a class of drugs that base pair with their target RNA and their effects are extremely long lasting compared to small molecule inhibitors. We utilized an ASO to specifically reduce Hdac2 messenger RNA (mRNA) quantities. We explored the longevity and mechanism of mRNA repression of our Hdac2 specific ASO. A single dose of the Hdac2-targeted ASO injected into the central nervous system diminished Hdac2 mRNA levels for at least 4 months in the brain, and knockdown of this factor resulted in significant memory enhancement. RNA-seq analysis of brain tissues revealed that reducing Hdac2 mRNA with the ASO caused alteration of steady-state levels for other memory-associated mRNAs. In looking at target knockdown in cultured neurons, we observed that our Hdac2-targeted ASO suppresses Hdac2 mRNA as well as an Hdac2 non-coding RNA. Importantly, we found that the ASO not only triggered a reduction in mRNA levels, but also elicited a direct transcriptional suppression of the Hdac2 gene by blocking RNA polymerase II elongation. These findings suggest transcriptional suppression of the target gene as a potential novel mechanism of action of ASOs, and opens up the possibility of using ASOs to achieve lasting gene silencing in without altering the nucleotide sequence of a gene.
17 tweets bioinformatics
Jerven Bolleman, Edouard de Castro, Delphine Baratin, Sebastien Gehant, Beatrice A Cuche, Andrea Auchincloss, Elisabeth COUDERT, Chantal Hulo, Patrick Masson, Ivo Pedruzzi, Catherine Rivoire, Ioannis Xenarios, Nicole Redaschi, Alan Bridge
Motivation: Genome and proteome annotation pipelines are generally custom built and therefore not easily reusable by other groups, which leads to duplication of effort, increased costs, and suboptimal results. One cost-effective way to increase the data quality in public databases is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation. Results: We have translated the rules of our HAMAP proteome annotation pipeline to queries in the W3C standard SPARQL 1.1 syntax and applied them with two off-the-shelf SPARQL engines to UniProtKB/Swiss-Prot protein sequences described in RDF format. This approach is applicable to any genome or proteome annotation pipeline and greatly simplifies their reuse. Availability: HAMAP SPARQL rules and documentation are freely available for download from the HAMAP FTP site ftp://ftp.expasy.org/databases/hamap/hamap_sparql.tar.gz under a CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 license.
17 tweets bioinformatics
Sample index hopping can substantially confound the analysis of multiplexed sequencing data due to the resulting erroneous assignment of some, or even all, of the sequencing reads generated by a cDNA fragment in a given sample to other samples. In those target samples, the data cross-contamination artifact takes the form of phantom molecules, molecules that exist only in the data by virtue of read misassignment. Phantom molecules should be a cause of great concern in droplet-based single cell RNA-seq experiments since they can introduce both phantom cells and artifactual deferentially-expressed genes in downstream analyses. More importantly, even when the index hopping rate is very small, the fraction of phantom molecules in the entire dataset can be high due to the distributional properties of sequencing reads across samples. To our knowledge, current computational methods are unable to accurately estimate the underlying rate of index hopping nor adequately correct for the resultant misassignment in droplet-based single cell RNA-seq data. Here, we introduce a probabilistic model that formalizes the phenomenon of index hopping and allows the accurate estimation of its rate. Application of the proposed model to several multiplexed datasets suggests that the sample index hopping probability for a given read is approximately 0.008, an arguable low number, even though, counter-intuitively, it can give rise to a large fraction of phantom molecules - as high as 85% - in any given sample. We also present a model-based approach for inferring the true sample of origin of the reads that are affected by index hopping, thus allowing the purging of the majority of phantom molecules in the data. Using simulated and empirical data, we show that we can reassign reads to their true sample of origin and remove predicted phantom molecules through a principled probabilistic procedure that optimally minimizes the false false positive rate. Thus, even though sample index hopping often substantially compromises single-cell RNA-seq data, it is possible to accurately quantify, detect, and reassign the affected reads and remove the phantom molecules generated by index hopping.
- Top preprints of 2018
- Author leaderboards
- Overall metrics
- The API
- About the project
- Email newsletter
- Rxivist preprint
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!