Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 60,222 bioRxiv papers from 267,718 authors.
Most tweeted bioRxiv papers, last 24 hours
326 results found. For more information, click each entry to expand.
105 tweets bioinformatics
Analysis of single-cell RNA-seq data begins with pre-processing of sequencing reads to generate count matrices. We investigate algorithm choices for the challenges of pre-processing, and describe a workflow that balances efficiency and accuracy. Our workflow is based on the kallisto (<https://pachterlab.github.io/kallisto/>) and bustools (<https://bustools.github.io/>) programs, and is near-optimal in speed and memory. The workflow is modular, and we demonstrate its flexibility by showing how it can be used for RNA velocity analyses. Documentation and tutorials for using the kallisto | bus workflow are available at <https://www.kallistobus.tools/>.
In higher plants, germline differentiation occurs during a relatively short period within developing flowers. Understanding of the mechanisms that govern germline differentiation lags behind other plant developmental processes. This is largely because the germline is restricted to relatively few cells buried deep within floral tissues, which makes them difficult to study. To overcome this limitation, we have developed a methodology for live imaging of the germ cell lineage within floral organs of Arabidopsis using light sheet fluorescence microscopy. We have established reporter lines, cultivation conditions, and imaging protocols for high-resolution microscopy of developing flowers continuously for up to several days. We used multiview imagining to reconstruct a three-dimensional model of a flower at subcellular resolution. We demonstrate the power of this approach by capturing male and female meiosis, asymmetric pollen division, movement of meiotic chromosomes, and unusual restitution mitosis in tapetum cells. This method will enable new avenues of research into plant sexual reproduction.
Alexandre Almeida, Stephen Nayfach, Miguel Boland, Francesco Strozzi, Martin Beracochea, Zhou Jason Shi, Katherine S Pollard, Donovan H Parks, Philip Hugenholtz, Nicola Segata, Nikos Kyrpides, Robert D. Finn
Comprehensive reference data is essential for accurate taxonomic and functional characterization of the human gut microbiome. Here we present the Unified Human Gastrointestinal Genome (UHGG) collection, a resource combining 286,997 genomes representing 4,644 prokaryotic species from the human gut. These genomes contain over 625 million protein sequences used to generate the Unified Human Gastrointestinal Protein (UHGP) catalogue, a collection that more than doubles the number of gut protein clusters over the Integrated Gene Catalogue. We find that a large portion of the human gut microbiome remains to be fully explored, with over 70% of the UHGG species lacking cultured representatives, and 40% of the UHGP missing meaningful functional annotations. Intra-species genomic variation analyses revealed a large reservoir of accessory genes and single-nucleotide variants, many of which were specific to individual human populations. These freely available genomic resources should greatly facilitate investigations into the human gut microbiome.
The multiple testing problem arises not only when there are many voxels or vertices in an image representation of the brain, but also when multiple contrasts of parameter estimates (that is, hypotheses) are tested in the same general linear model. Here we argue that a correction for this multiplicity must be performed to avoid excess of false positives. Various methods have been proposed in the literature, but few have been applied in brain imaging. Here we discuss and compare different methods to make such correction in different scenarios, showing that one classical and well known method is invalid, and argue that permutation is the best option to perform such correction due to its exactness and flexibility to handle a variety of common imaging situations.
There is particular interest in transcriptome-wide association studies (TWAS) - gene-level tests based on multi-SNP predictive models of gene expression - for identifying causal genes at loci associated with complex traits. However, interpretation of TWAS associations may be complicated by divergent effects of model SNPs on trait phenotype and gene expression. We developed an iterative modelling scheme for obtaining multi-SNP models of gene expression and applied this framework to generate expression models for 43 human tissues from the Genotype-Tissues Expression (GTEx) Project. We characterized the performance of single- and multi-SNP TWAS models for identifying causal genes in GWAS data for 46 circulating metabolites. We show that: (a) multi-SNP models captured more variation in expression than the top cis-eQTL (median 2 fold improvement); (b) predicted expression based on multi-SNP models was associated (FDR<0.01) with metabolite levels for 826 unique gene-metabolite pairs, but, after step-wise conditional analyses, 90% were dominated by a single eQTL SNP; (c) amongst the 35% of associations where a SNP in the expression model was a significant cis-eQTL and metabolomic-QTL (met-QTL), 92% demonstrated colocalization between these signals, but interpretation was often complicated by incomplete overlap of QTLs in multi-SNP models; (d) using a "truth" set of causal genes at 61 met-QTLs, the sensitivity was high (67%), but the positive predictive value was low, as only 8% of TWAS associations at met-QTLs involved true causal genes. These results guide the interpretation of TWAS and highlight the need for corroborative data to provide confident assignment of causality.
55 tweets microbiology
We recently introduced the Genome Taxonomy Database (GTDB), a phylogenetically consistent, genome-based taxonomy providing rank normalized classifications for nearly 150,000 genomes from domain to genus. However, nearly 40% of the genomes used to infer the GTDB reference tree lack a species name, reflecting the large number of genomes in public repositories without complete taxonomic assignments. Here we address this limitation by proposing 24,706 species clusters which encompass all publicly available bacterial and archaeal genomes when using commonly accepted average nucleotide identity (ANI) criteria for circumscribing species. In contrast to previous ANI studies, we selected a single representative genome to serve as the nomenclatural type for circumscribing each species with type strains used where available. We complemented the 8,792 species clusters with validly or effectively published names with 15,914 de novo species clusters in order to assign placeholder names to the growing number of genomes from uncultivated species. This provides the first complete domain to species taxonomic framework which will improve communication of scientific results.
The free-living nematode Caenorhabditis elegans is a key laboratory model for metazoan biology. C. elegans is also used as a model for parasitic nematodes despite being only distantly related to most parasitic species. All ~65 Caenorhabditis species currently in culture are free-living with most having been isolated from decaying plant or fungal matter. Caenorhabditis bovis is a particularly unusual species, having been isolated several times from the inflamed ears of Zebu cattle in Eastern Africa where it is believed to be the cause of bovine parasitic otitis. C. bovis is therefore of particular interest to researchers interested in the evolution of nematode parasitism and in Caenorhabditis diversity. However, as C. bovis is not in laboratory culture, it remains little studied and details of its prevalence, role in bovine parasitic otitis and relationships to other Caenorhabditis species are scarce. Here, by sampling livestock markets and slaughterhouses in Western Kenya, we successfully reisolate C. bovis from the ear of adult female Zebu. We sequence the genome of C. bovis using the Oxford Nanopore MinION platform in a nearby field laboratory and use the data to generate a chromosome-scale draft genome sequence. We exploit this draft genome to reconstruct the phylogenetic relationships of C. bovis to other Caenorhabditis species and reveal the changes in genome size and content that have occurred during its evolution. We also identify expansions in several gene families that have been implicated in parasitism in other nematode species, including those associated with resistance to antihelminthic drugs. The high-quality draft genome and our analyses thereof represent a significant advancement in our understanding of this unusual Caenorhabditis species.
Clement Gallay, Stefano Sanselicio, Mary E Anderson, Young Min Soh, Xue Liu, Gro Anita Stamsas, Simone Pelliciari, Renske van Raaphorst, Morten Kjos, Heath Murray, Stephan Gruber, Alan D. Grossman, Jan-Willem Veening
Most bacteria replicate and segregate their DNA concomitantly while growing, before cell division takes place. How bacteria synchronize these different cell cycle events to ensure faithful chromosome inheritance is poorly understood. Here, we identified a conserved and essential protein in pneumococci and related Firmicutes named CcrZ (for Cell Cycle Regulator protein interacting with FtsZ) that couples cell division with DNA replication by controlling the activity of the master initiator of DNA replication, DnaA. The absence of CcrZ causes mis-timed and reduced initiation of DNA replication, which subsequently results in aberrant cell division. We show that CcrZ from Streptococcus pneumoniae directly interacts with the cytoskeleton protein FtsZ to place it in the middle of the newborn cell where the DnaA-bound origin is positioned. Together, this work uncovers a new mechanism for the control of the bacterial cell cycle in which CcrZ controls DnaA activity to ensure that the chromosome is replicated at the right time during the cell cycle.
Many theories propose recurrent interactions across the cortical hierarchy, but it is unclear if cortical circuits are selectively wired to implement looped computations. Using subcellular channelrhodopsin-2-assisted circuit mapping in mouse visual cortex, we compared feedforward (FF) or feedback (FB) cortico-cortical input to cells projecting back to the input source (looped neurons) with cells projecting to a different cortical or subcortical area (non-looped neurons). Despite having different laminar innervation patterns, FF and FB afferents showed similar cell-type selectivity, making stronger connections with looped neurons versus non-looped neurons in layer (L) 5 and L6, but not in L2/3. FB inputs preferentially innervated the apical tufts of looped L5 neurons, but not their perisomatic dendrites. Our results reveal that interareal cortical connections are selectively wired into monosynaptic excitatory loops involving L6 and the apical dendrites of L5 neurons, supporting a role of these circuit elements in hierarchical recurrent computations.
The molecular mechanisms responsible for Topologically Associated Domains (TADs) formation are not yet fully understood. In Drosophila, it has been proposed that transcription is fundamental for TAD organization while the participation of genetic sequences bound by Architectural Proteins (APs) remains controversial. Here, we investigate the contribution of domain boundaries to TAD organization and the regulation of gene expression at the Notch gene locus in Drosophila. We find that deletion of domain boundaries results in TAD fusion and long-range topological defects that are accompanied by loss of APs and RNA Pol II chromatin binding as well as defects in transcription. Together, our results provide compelling evidence on the contribution of discrete genetic sequences bound by APs and RNA Pol II in the partition of the genome into TADs and in the regulation of gene expression in Drosophila.
W. A. Nazni, A. A. Hoffmann, A. Noor Afizah, Y. L. Cheong, M. V. Mancini, N. Golding, M. R. G. Kamarul, A. K. M. Arif, H. Thohir, H. S. Nur Syamimi, M. Z. Nur Zatil Aqmar, M. M. Nur Ruqqayah, A. Siti Nor Syazwani, A. Faiz, M. N. F. R. Irfan, S. Rubaaini, N. Nuradila, M. M. N. Nizam, M. S. Mohamad Irwan, N. M. Endersby-Harshman, V. L. White, T.H. Ant, C. Herd, H. A. Hasnor, R. Abu Bakar, M. D. Hapsah, K Khadijah, D. Kamilan, S. C. Lee, M. Paid, K. Fadzilah, B. S. Gill, H. L. Lee, Steven P. Sinkins
Dengue has enormous health impacts globally. A novel approach to decrease dengue incidence involves the introduction of Wolbachia endosymbionts that block dengue virus transmission into populations of the primary vector mosquito, Aedes aegypti. The wMel Wolbachia strain has previously been trialed in open releases of Ae. aegypti; however the wAlbB strain has been shown to maintain higher density than wMel at high larval rearing temperatures. Releases of Ae. aegypti mosquitoes carrying wAlbB were carried out in 6 diverse sites in greater Kuala Lumpur, Malaysia, with high endemic dengue transmission. The strain was successfully established and maintained at very high population frequency at some sites, or persisted with additional releases following fluctuations at other sites. Based on passive case monitoring, reduced human dengue incidence was observed in the release sites when compared to control sites. The wAlbB strain of Wolbachia provides a promising option as a tool for dengue control, particularly in very hot climates.
Adult height was one of the earliest putative examples of polygenic adaptation in human. By constructing polygenic height scores using effect sizes and frequencies from hundreds of genomic loci robustly associated with height, it was reported that Northern Europeans were genetically taller than Southern Europeans beyond neutral expectation. However, this inference was recently challenged. Sohail et al. and Berg et al. showed that the polygenic signature disappeared if summary statistics from UK Biobank (UKB) were used in the analysis, suggesting that residual uncorrected stratification from large-scale consortium studies was responsible for the previously noted genetic difference. It thus remains an open question whether height loci exhibit signals of polygenic adaptation in any human population. In the present study, we re-examined this question, focusing on one of the shortest European populations, the Sardinians, as well as on the mainland European populations in general. We found that summary statistics from UKB significantly correlate with population structure in Europe. To further alleviate concerns of biased ascertainment of GWAS loci, we examined height-associated loci from the Biobank of Japan (BBJ). Applying frequency-based inference over these height-associated loci, we showed that the Sardinians remain significantly shorter than expected (~ 0.35 standard deviation shorter than CEU based on polygenic height scores, P = 1.95e-6). We also found the trajectory of polygenic height scores decreased over at least the last 10,000 years when compared to the British population (P = 0.0123), consistent with a signature of polygenic adaptation at height-associated loci. Although the same approach showed a much subtler signature in mainland European populations, we found a clear and robust adaptive signature in UK population using a haplotype-based statistic, tSDS, driven by the height-increasing alleles (P = 4.8e-4). In summary, by examining frequencies at height loci ascertained in a distant East Asian population, we further supported the evidence of polygenic adaptation at height-associated loci among the Sardinians. In mainland Europeans, we also found an adaptive signature, although becoming more pronounced only in haplotype-based analysis.
21 tweets cell biology
Fibrillar adhesions are important structural and adhesive components in fibroblasts that are critical for fibronectin fibrillogenesis. While nascent and focal adhesions are known to respond to mechanical cues, the mechanoresponsive nature of fibrillar adhesions remains unclear. Here, we used ratiometric analysis of paired adhesion components to determine an appropriate fibrillar adhesion marker. We found that active α5β1-integrin exhibits the most definitive fibrillar adhesion localisation compared to other proteins, such as tensin1, reported to be in fibrillar adhesions. To elucidate the mechanoresponsiveness of fibrillar adhesions, we designed and fabricated thin polyacrylamide (PA) hydrogels, embedded with fluorescently labelled beads, with physiologically relevant stiffness gradients using a cost-effective and reproducible technique. We generated a correlation curve between bead density and hydrogel stiffness, thus allowing the use of bead density as a readout of stiffness, eliminating the need for specialised knowhow including atomic force microscopy (AFM). We find that stiffness promotes the growth of fibrillar adhesions in a tensin-dependent manner. Thus, the formation of these ECM depositing structures is coupled to the mechanical parameters of the cell environment and may enable cells to fine-tune their matrix environment in response to alternating physical conditions.
Preferences for primary goods, such as food items, are variable and depend on current physiological state. Decisions on cultural goods, entailing no direct physiological consequence, rely on the same neural underpinnings of valuation and choice than primary goods. Here, we test whether subjective preferences for cultural goods inherit a functional link with the neural circuitry monitoring physiological variables. Using magnetoencephalography in human participants choosing between two movies, we measured heartbeat-evoked responses (HERs), which index the cortical monitoring of cardiac contractions. Trial-by-trial fluctuations in HERs, measured before option presentation, influenced the subsequent neural encoding of subjective value in ventro-medial prefrontal cortex - a region involved both in valuation and cardiac monitoring. The neural interaction between HERs and value encoding enhanced trial-by-trial choice precision and predicted inter-individual differences in choice consistency. The enhanced monitoring of physiological variables thus supports more stable representations of subjective, self-related cognitive information.
20 tweets ecology
Many populations are affected by hunting or fishing. Models designed to assess the sustainability of harvest management require accurate estimates of demographic parameters (e.g. survival, reproduction) hardly estimable with limited data collected on exploited populations. The joint analysis of different data sources with integrated population models (IPM) is an optimal framework to obtain reliable estimates for parameters usually difficult to estimate, while accounting for imperfect detection and observation error. The IPM built so far for exploited populations have integrated count-based surveys and catch-at-age data into age-class structured population models. But the age of harvested individuals is difficult to assess and often not recorded, and population counts are often not performed on a regular basis, limiting their use for the monitoring of exploited populations. Here, we propose an IPM that makes efficient use of data commonly collected in exploited marine and terrestrial populations of vertebrates. As individual measures of body mass at both capture and death are often collected in fish and terrestrial game species, our model integrates capture-mark-recapture-recovery data and data collected at death into a body mass-structured population model. It allows the observed number of individuals harvested to be compared with the expected number and provides accurate estimates of demographic parameters. We illustrate the usefulness of this IPM using an emblematic game species distributed worldwide, the wild boar Sus scrofa, as a case study. For this species that has increased in distribution and abundance over the last decades, the model provides accurate and precise annual estimates of key demographic parameters (survival, reproduction, growth) and of population size while accounting for imperfect detection and observation error. To avoid an overexploitation of declining populations or an under-exploitation of increasing populations, it is crucial to gain a good understanding of the dynamics of exploited populations. When managers or conservationists have limited demographic data, the IPM offers a powerful framework to assess population dynamics. Being highly flexible, the approach is broadly applicable to both terrestrial and marine exploited populations for which measures of body mass are commonly recorded and more generally, to all populations suffering from anthropogenic mortality causes.
19 tweets bioinformatics
Gene annotation is a critical bottleneck in genomic research, especially for the comprehensive study of very large gene families in the genomes of non-model organisms. Despite the recent progress in automatic methods, the tools developed for this task often produce inaccurate annotations, such as fused, chimeric, partial or even completely absent gene models for many family copies, which require considerable extra efforts to be amended. Here we present BITACORA, a bioinformatics solution that integrates sequence similarity search tools and Perl scripts to facilitate both the curation of these inaccurate annotations and the identification of previously undetected gene family copies directly from DNA sequences. We tested the performance of the BITACORA pipeline in annotating the members of two chemosensory gene families of different sizes in seven available chelicerate genome drafts. Despite the relatively high fragmentation of some of these drafts, BITACORA was able to improve the annotation of many members of these families and detected thousands of new chemoreceptors encoded in genome sequences. The program generates an output file in the general feature format (GFF) files, with both curated and novel gene models, and a FASTA file with the predicted proteins. These outputs can be easily integrated in genomic annotation editors, greatly facilitating subsequent manual annotation and downstream evolutionary analyses.
If HA Barnes, Ximena Ibarra-Soria, Stephen Fitzgerald, Jose M. Gonzalez, Claire Davidson, Matthew P Hardy, Deepa Manthravadi, Laura Van Gerven, Mark Jorissen, Zhen Zeng, Mona Khan, Peter Mombaerts, Jennifer Harrow, Darren W Logan, Adam Frankish
Olfactory receptor (OR) genes are the largest multi-gene family in the mammalian genome, with over 850 in human and nearly 1500 genes in mouse. The expansion of the OR gene repertoire has occurred through numerous duplication events followed by diversification, resulting in a large number of highly similar paralogous genes. These characteristics have made the annotation of the complete OR gene repertoire a complex task. Most OR genes have been predicted in silico and are typically annotated as intronless coding sequences. Here we have developed an expert curation pipeline to analyse and annotate every OR gene in the human and mouse reference genomes. By combining evidence from structural features, evolutionary conservation and experimental data, we have unified the annotation of these gene families, and have systematically determined the protein-coding potential of each locus. We have defined the non-coding regions of many OR genes, enabling us to generate full-length transcript models. We found that 13 human and 41 mouse OR loci have coding sequences that are split across two exons. These split OR genes are conserved across mammals, and are expressed at the same level as protein-coding OR genes with an intronless coding region. Our findings challenge the long-standing and widespread notion that the coding region of a vertebrate OR gene is contained within a single exon.
Proton pump inhibitor (PPI) use has been associated with microbiota alterations and susceptibility to Clostridium difficile infections (CDIs) in humans. We assessed how PPI treatment alters the fecal microbiota and whether treatment promotes CDIs in a mouse model. Mice receiving a PPI treatment were gavaged with 40 mg/kg of omeprazole during a 7-day pretreatment phase, the day of C. difficile challenge, and the following 9 days. We found that mice treated with omeprazole were not colonized by C. difficile. When omeprazole treatment was combined with a single clindamycin treatment, one cage of mice remained resistant to C. difficile colonization, while the other cage was colonized. Treating mice with only clindamycin followed by challenge resulted in C. difficile colonization. 16S rRNA gene sequencing analysis revealed that omeprazole had minimal impact on the structure of the murine microbiota throughout the 16 days of omeprazole exposure. These results suggest omeprazole treatment alone is not sufficient to disrupt microbiota resistance to C. difficile infection in mice that are normally resistant in the absence of antibiotic treatment.
16 tweets systems biology
Compositional changes in the gut microbiota have been associated with a variety of medical conditions such as obesity, Crohn's disease and diabetes. However, connecting microbial community composition to ecosystem function remains a challenge. Here, we introduce MICOM - a customizable metabolic model of the human gut microbiome. By using a heuristic optimization approach based on L2 regularization we were able to obtain a unique set of realistic growth rates that corresponded well with observed replication rates. We integrated adjustable dietary and taxon abundance constraints to generate personalized metabolic models for individual metagenomic samples. We applied MICOM to a balanced cohort of metagenomes from 186 people, including a metabolically healthy population and individuals with type 1 and type 2 diabetes. Model results showed that individual bacterial genera maintained conserved niche structures across humans, while the community-level production of short chain fatty acids (SCFAs) was heterogeneous and highly individual-specific. Model output revealed complex cross-feeding interactions that would be difficult to measure in vivo. Metabolic interaction networks differed somewhat consistently between healthy and diabetic subjects. In particular MICOM predicted reduced butyrate and propionate production in a diabetic cohort, with restoration of SCFA production profiles found in healthy subjects following metformin treatment. Overall, we found that changes in diet or taxon abundances have highly personalized effects. We believe MICOM can serve as a useful tool for generating mechanistic hypotheses for how diet and microbiome composition influence community function. All methods are implemented in the open source Python package, which is available at https://github.com/micom-dev/micom.
16 tweets bioinformatics
Next Generation Sequencing (NGS) has become the go-to standard method for the detection of Single Nucleotide Variants (SNV) in tumor cells. The use of such technologies requires a PCR amplification step and a sequencing step, steps in which artifacts are introduced at very low frequencies. These artifacts are often confused with true low-frequency variants that can be found in tumor cells and cell-free DNA. The recent use of Unique Molecular Identifiers (UMI) in targeted sequencing protocols has offered a trustworthy approach to filter out artifactual variants and accurately call low frequency variants. However, the integration of UMI analysis in the variant calling process led to developing tools that are significantly slower and more memory consuming than raw-reads-based variant callers. We present UMI-VarCal, a UMI-based variant caller for targeted sequencing data with better sensitivity compared to other variant callers. Being developed with performance in mind, UMI-VarCal stands out from the crowd by being one of the few variant callers that don't rely on SAMtools to do their pileup. Instead, at its core runs an innovative homemade pileup algorithm specifically designed to treat the UMI tags in the reads. After the pileup, a Poisson statistical test is applied at every position to determine if the frequency of the variant is significantly higher than the background error noise. Finally, an analysis of UMI tags is performed, a strand bias and a homopolymer length filter are applied to achieve better accuracy. We illustrate the results obtained using UMI-VarCal through the sequencing of tumor samples and we show how UMI-VarCal is both faster and more sensitive than other publicly available solutions.
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!