Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 52,519 bioRxiv papers from 243,473 authors.
Most downloaded bioRxiv papers, all time
in category molecular biology
1,534 results found. For more information, click each entry to expand.
22,254 downloads molecular biology
Forward genetic screens are powerful tools for the unbiased discovery and functional characterization of specific genetic elements associated with a phenotype of interest. Recently, the RNA-guided endonuclease Cas9 from the microbial immune system CRISPR (clustered regularly interspaced short palindromic repeats) has been adapted for genome-scale screening by combining Cas9 with guide RNA libraries. Here we describe a protocol for genome-scale knockout and transcriptional activation screening using the CRISPR-Cas9 system. Custom- or ready-made guide RNA libraries are constructed and packaged into lentivirus for delivery into cells for screening. As each screen is unique, we provide guidelines for determining screening parameters and maintaining sufficient coverage. To validate candidate genes identified from the screen, we further describe strategies for confirming the screening phenotype as well as genetic perturbation through analysis of indel rate and transcriptional activation. Beginning with library design, a genome-scale screen can be completed in 6-10 weeks followed by 3-4 weeks of validation.
21,239 downloads molecular biology
Rahul Sinha, Geoff Stanley, Gunsagar Singh Gulati, Camille Ezran, Kyle Joseph Travaglini, Eric Wei, Charles Kwok Fai Chan, Ahmad N Nabhan, Tianying Su, Rachel Marie Morganti, Stephanie Diana Conley, Hassan Chaib, Kristy Red-Horse, Michael T Longaker, Michael P Snyder, Mark A Krasnow, Irving L Weissman
Illumina-based next generation sequencing (NGS) has accelerated biomedical discovery through its ability to generate thousands of gigabases of sequencing output per run at a fraction of the time and cost of conventional technologies. The process typically involves four basic steps: library preparation, cluster generation, sequencing, and data analysis. In 2015, a new chemistry of cluster generation was introduced in the newer Illumina machines (HiSeq 3000/4000/X Ten) called exclusion amplification (ExAmp), which was a fundamental shift from the earlier method of random cluster generation by bridge amplification on a non-patterned flow cell. The ExAmp chemistry, in conjunction with patterned flow cells containing nanowells at fixed locations, increases cluster density on the flow cell, thereby reducing the cost per run. It also increases sequence read quality, especially for longer read lengths (up to 150 base pairs). This advance has been widely adopted for genome sequencing because greater sequencing depth can be achieved for lower cost without compromising the quality of longer reads. We show that this promising chemistry is problematic, however, when multiplexing samples. We discovered that up to 5-10% of sequencing reads (or signals) are incorrectly assigned from a given sample to other samples in a multiplexed pool. We provide evidence that this “spreading-of-signals” arises from low levels of free index primers present in the pool. These index primers can prime pooled library fragments at random via complementary 3′ ends, and get extended by DNA polymerase, creating a new library molecule with a new index before binding to the patterned flow cell to generate a cluster for sequencing. This causes the resulting read from that cluster to be assigned to a different sample, causing the spread of signals within multiplexed samples. We show that low levels of free index primers persist after the most common library purification procedure recommended by Illumina, and that the amount of signal spreading among samples is proportional to the level of free index primer present in the library pool. This artifact causes homogenization and misclassification of cells in single cell RNA-seq experiments. Therefore, all data generated in this way must now be carefully re-examined to ensure that “spreading-of-signals” has not compromised data analysis and conclusions. Re-sequencing samples using an older technology that uses conventional bridge amplification for cluster generation, or improved library cleanup strategies to remove free index primers, can minimize or eliminate this signal spreading artifact.
7,863 downloads molecular biology
CRISPR/Cas technologies have transformed our ability to manipulate genomes for research and gene-based therapy. In particular, homology-directed repair after genomic cleavage allows for precise modification of genes using exogenous donor sequences as templates. While both single-stranded DNA (ssDNA) and double-stranded DNA (dsDNA) forms of donors have been used as repair templates, a systematic comparison of the performance and specificity of repair using ssDNA versus dsDNA donors is still lacking. Here, we describe an optimized method for the synthesis of long ssDNA templates and demonstrate that ssDNA donors can drive efficient integration of gene-sized reporters in human cell lines. We next define a set of rules to maximize the efficiency of ssDNA-mediated knock-in by optimizing donor design. Finally, by comparing ssDNA donors with equivalent dsDNA sequences (PCR products or plasmids), we demonstrate that ssDNA templates have a unique advantage in terms of repair specificity while dsDNA donors can lead to a high rate of off-target integration. Our results provide a framework for designing high-fidelity CRISPR-based knock-in experiments, in both research and therapeutic settings.
5,043 downloads molecular biology
Pinar Akcakaya, Maggie L. Bobbin, Jimmy A. Guo, Jose Malagon Lopez, M. Kendell Clement, Sara P. Garcia, Mick D. Fellows, Michelle J. Porritt, Mike A. Firth, Alba Carreras, Tania Baccega, Frank Seeliger, Mikael Bjursell, Shengdar Q. Tsai, Nhu T. Nguyen, Roberto Nitsch, Lorenz Mayr, Luca Pinello, Mohammad Bohlooly-Y, Martin J Aryee, Marcello Maresca, J. Keith Joung
CRISPR-Cas genome-editing nucleases hold substantial promise for human therapeutics but identifying unwanted off-target mutations remains an important requirement for clinical translation. For ex vivo therapeutic applications, previously published cell-based genome-wide methods provide potentially useful strategies to identify and quantify these off-target mutation sites. However, a well-validated method that can reliably identify off-targets in vivo has not been described to date, leaving the question of whether and how frequently these types of mutations occur. Here we describe Verification of In Vivo Off-targets (VIVO), a highly sensitive, unbiased, and generalizable strategy that we show can robustly identify genome-wide CRISPR-Cas nuclease off-target effects in vivo. To our knowledge, these studies provide the first demonstration that CRISPR-Cas nucleases can induce substantial off-target mutations in vivo, a result we obtained using a deliberately promiscuous guide RNA (gRNA). More importantly, we used VIVO to show that appropriately designed gRNAs can direct efficient in vivo editing without inducing detectable off-target mutations. Our findings provide strong support for and should encourage further development of in vivo genome editing therapeutic strategies.
5,008 downloads molecular biology
The study of microbial communities has been revolutionised in recent years by the widespread adoption of culture independent analytical techniques such as 16S rRNA gene sequencing and metagenomics. One potential confounder of these sequence-based approaches is the presence of contamination in DNA extraction kits and other laboratory reagents. In this study we demonstrate that contaminating DNA is ubiquitous in commonly used DNA extraction kits, varies greatly in composition between different kits and kit batches, and that this contamination critically impacts results obtained from samples containing a low microbial biomass. Contamination impacts both PCR-based 16S rRNA gene surveys and shotgun metagenomics. These results suggest that caution should be advised when applying sequence-based techniques to the study of microbiota present in low biomass environments. We provide an extensive list of potential contaminating genera, and guidelines on how to mitigate the effects of contamination. Concurrent sequencing of negative control samples is strongly advised.
4,923 downloads molecular biology
Long noncoding RNAs (lncRNAs) are a diverse class of RNAs with increasingly appreciated functions in vertebrates, yet much of their biology remains poorly understood. In particular, it is unclear to what extent the current catalog of over 10,000 distinct annotated lncRNAs is indeed devoid of genes coding for proteins. Here we review the available computational and experimental schemes for distinguishing between recent genome-wide applications. We conclude that the model most consistent with available data is that a large number of mammalian lncRNAs undergo translation, but only a very small minority of such translation events result in stable and functional peptides. The outcome of the majority of the translation events and their potential biological purposes remain an intriguing topic for future investigation.
4,673 downloads molecular biology
We previously described a novel alternative to Chromatin Immunoprecipitation, Cleavage Under Targets & Release Using Nuclease (CUT&RUN), in which unfixed permeabilized cells are incubated with antibody, followed by binding of a Protein A-Micrococcal Nuclease (pA/MNase) fusion protein (1). Upon activation of tethered MNase, the bound complex is excised and released into the supernatant for DNA extraction and sequencing. Here we introduce four enhancements to CUT&RUN: 1) a hybrid Protein A-Protein G-MNase construct that expands antibody compatibility and simplifies purification; 2) a modified digestion protocol that inhibits premature release of the nuclease-bound complex; 3) a calibration strategy based on carry-over of E. coli DNA introduced with the fusion protein; and 4) a novel peak-calling strategy customized for the low-background profiles obtained using CUT&RUN. These new features, coupled with the previously described low-cost, high efficiency, high reproducibility and high- throughput capability of CUT&RUN make it the method of choice for routine epigenomic profiling.
4,361 downloads molecular biology
Bernd Zetsche, Matthias Heidenreich, Prarthana Mohanraju, Iana Fedorova, Jeroen Kneppers, Ellen M DeGennaro, Nerges Winblad, Sourav R Choudhury, Omar O Abudayyeh, Jonathan S Gootenberg, Wen Y Wu, David A Scott, Konstantin Severinov, John van der Oost, Feng Zhang
Microbial CRISPR-Cas defense systems have been adapted as a platform for genome editing applications built around the RNA-guided effector nucleases, such as Cas9. We recently reported the characterization of Cpf1, the effector nuclease of a novel type V-A CRISPR system, and demonstrated that it can be adapted for genome editing in mammalian cells. Unlike Cas9, which utilizes a trans-activating crRNA (tracrRNA) as well as the endogenous RNaseIII for maturation of its dual crRNA:tracrRNA guides, guide processing of the Cpf1 system proceeds in the absence of tracrRNA or other Cas (CRISPR associated) genes, suggesting that Cpf1 is sufficient for pre-crRNA maturation. This has important implications for genome editing, as it would provide a simple route to multiplex targeting. Here, we show for two Cpf1 orthologs that no other factors are required for array processing and demonstrate multiplex gene editing in mammalian cells as well as in the mouse brain by using a designed single CRISPR array.
4,275 downloads molecular biology
Omar O Abudayyeh, Jonathan S Gootenberg, Silvana Konermann, Julia Joung, Ian M Slaymaker, David B.T. Cox, Sergey Shmakov, Kira S. Makarova, Ekaterina Semenova, Leonid Minakhin, Konstantin Severinov, Aviv Regev, Eric S Lander, Eugene V. Koonin, Feng Zhang
The CRISPR-Cas adaptive immune system defends microbes against foreign genetic elements via DNA or RNA-DNA interference. We characterize the Class 2 type VI-A CRISPR-Cas effector C2c2 and demonstrate its RNA-guided RNase function. C2c2 from the bacterium Leptotrichia shahii provides interference against RNA phage. In vitro biochemical analysis show that C2c2 is guided by a single crRNA and can be programmed to cleave ssRNA targets carrying complementary protospacers. In bacteria, C2c2 can be programmed to knock down specific mRNAs. Cleavage is mediated by catalytic residues in the two conserved HEPN domains, mutations in which generate catalytically inactive RNA-binding proteins. These results broaden our understanding of CRISPR-Cas systems and suggest that C2c2 can be used to develop new RNA-targeting tools.
3,990 downloads molecular biology
As new sequencing technologies become cheaper and older ones disappear, laboratories switch vendors and platforms. Validating the new setups is a crucial part of conducting rigorous scientific research. Here we report on the reliability and biases of performing bacterial 16S rRNA gene amplicon paired-end sequencing on the MiSeq Illumina platform. We designed a protocol using 50 barcode pairs to run samples in parallel and coded a pipeline to process the data. Sequencing the same sediment sample in 248 replicates as well as 70 samples from alkaline soda lakes, we evaluated the performance of the method with regards to estimates of alpha and beta diversity. Using different purification and DNA quantification procedures we always found up to 5-fold differences in the yield of sequences between individually barcodes samples. Using either a one-step or a two-step PCR preparation resulted in significantly different estimates in both alpha and beta diversity. Comparing with a previous method based on 454 pyrosequencing, we found that our Illumina protocol performed in a similar manner -- with the exception for evenness estimates where correspondence between the methods was low. We further quantified the data loss at every processing step eventually accumulating to 50\% of the raw reads. When evaluating different OTU clustering methods, we observed a stark contrast between the results of QIIME with default settings and the more recent UPARSE algorithm when it comes to the number of OTUs generated. Still, overall trends in alpha and beta diversity corresponded highly using both clustering methods. Our procedure performed well considering the precisions of alpha and beta diversity estimates, with insignificant effects of individual barcodes. Comparative analyses suggest that 454 and Illumina sequence data can be combined if the same PCR protocol and bioinformatic workflows are used for describing patterns in richness, beta-diversity and taxonomic composition. (version 1.1 resubmitted to PLOS one 2014-Sept-08)
3,699 downloads molecular biology
The noncoding genome plays a major role in gene regulation and disease yet we lack tools for rapid identification and manipulation of noncoding elements. Here, we develop a large-scale CRISPR screen employing ~18,000 sgRNAs targeting >700 kb of noncoding sequence in an unbiased manner surrounding three genes (NF1, NF2, and CUL3) involved in resistance to the BRAF inhibitor vemurafenib in the BRAF-mutant melanoma cell line A375. We identify specific noncoding locations near genes that modulate drug resistance when mutated. These sites have predictive hallmarks of noncoding function, such as physical interaction with gene promoters, evolutionary conservation and tissue-specific chromatin accessibility. At a subset of identified elements at the CUL3 locus, we show that engineered mutations lead to a loss of gene expression associated with changes in transcription factor occupancy and in long-range and local epigenetic environments, implicating these sites in gene regulation and chemotherapeutic resistance. This demonstration of an unbiased mutagenesis screen across large noncoding regions expands the potential of pooled CRISPR screens for fundamental genomic discovery and for elucidating biologically relevant mechanisms of gene regulation.
3,677 downloads molecular biology
Schaefer et al. recently advanced the provocative conclusion that CRISPR-Cas9 nuclease can induce off-target alterations at genomic loci that do not resemble the intended on-target site. Using high-coverage whole genome sequencing (WGS), these authors reported finding SNPs and indels in two CRISPR-Cas9-treated mice that were not present in a single untreated control mouse. On the basis of this association, Schaefer et al. concluded that these sequence variants were caused by CRISPR-Cas9. This new proposed CRISPR-Cas9 off-target activity runs contrary to previously published work and, if the authors are correct, could have profound implications for research and therapeutic applications. Here we demonstrate that the simplest interpretation of Schaefer et al.'s data is that the two CRISPR-Cas9-treated mice are actually more closely related genetically to each other than to the control mouse. This strongly suggests that the so-called “unexpected mutations” simply represent SNPs and indels shared in common by these mice prior to nuclease treatment. In addition, given the genomic and sequence distribution profiles of these variants, we show that it is challenging to explain how CRISPR-Cas9 might be expected to induce such changes. Finally, we argue that the lack of appropriate controls in Schaefer et al.'s experimental design precludes assignment of causality to CRISPR-Cas9. Given these substantial issues, we urge Schaefer et al. to revise or re-state the original conclusions of their published work so as to avoid leaving misleading and unsupported statements to persist in the literature.
3,536 downloads molecular biology
Gene tagging with fluorescent proteins (FPs) is essential to investigate the dynamic properties of cellular proteins. Clustered Regularly Interspaced Short Palindromic Repeats/Cas9 technology (CRISPR/Cas9) technology is a powerful tool for inserting fluorescent markers into all alleles of the gene of interest and permits functionality and physiological expression of the fusion protein. It is essential to evaluate such genome-edited cell lines carefully in order to preclude off-target effects caused by either (i) incorrect insertion of the FP, (ii) perturbation of the fusion protein by the FP or (iii) non-specific genomic DNA damage by CRISPR/Cas9. In this protocol, we provide a step-by-step description of our systematic pipeline to generate and validate homozygous fluorescent knock-in cell lines. We have used the paired Cas9D10A nickase approach to efficiently insert tags into specific genomic loci via homology-directed repair (HDR) with minimal off-target effects. It is not only time- and cost-effective to perform whole genome sequencing of each cell clone, but also there are spontaneous genetic variations occurring in mammalian cell lines. Therefore we have developed an efficient validation pipeline of the generated cell lines consisting of junction PCR, Southern Blot analysis, Sanger sequencing, microscopy, Western blot analysis and live cell imaging for cell cycle dynamics which takes between 6-9 weeks. Using this pipeline, 70% of the targeted genes could be tagged homozygously with FPs and resulted in physiological levels and phenotypically functional expression of the fusion proteins. In contrast to a study that systematically tagged genes using CRISPR/Cas9 in human stem cells1, our approach resulted in homozygously tagged proteins of interests.
3,169 downloads molecular biology
We describe a method for sequencing full-length 16S rRNA gene amplicons using the high throughput Illumina MiSeq platform. The resulting sequences have about 100-fold higher accuracy than standard Illumina reads and are chimera filtered using information from a single molecule dual tagging scheme that boosts the signal available for chimera detection. We demonstrate that the data provides fine scale phylogenetic resolution not available from Illumina amplicon methods targeting smaller variable regions of the 16S rRNA gene.
3,138 downloads molecular biology
Ishaan Gupta, Paul G Collier, Bettina Haase, Ahmed Mahfouz, Anoushka Joglekar, Taylor Floyd, Frank Koopmans, Ben Barres, August B. Smit, Steven Sloan, Wenjie Luo, Olivier Fedrigo, M Elizabeth Ross, Hagen U Tilgner
Full-length isoform sequencing has advanced our knowledge of isoform biology. However, apart from applying full-length isoform sequencing to very few single cells, isoform sequencing has been limited to bulk tissue, cell lines, or sorted cells. Single splicing events have been described for <=200 single cells with great statistical success, but these methods do not describe full-length mRNAs. Single cell short-read 3' sequencing has allowed identification of many cell sub-types, but full-length isoforms for these cell types have not been profiled. Using our new method of single-cell-isoform-RNA-sequencing (ScISOr-Seq) we determine isoform-expression in thousands of individual cells from a heterogeneous bulk tissue (cerebellum), without specific antibody-fluorescence activated cell sorting. We elucidate isoform usage in high-level cell types such as neurons, astrocytes and microglia and finer sub-types, such as Purkinje cells and Granule cells, including the combination patterns of distant splice sites, which for individual molecules requires long reads. We produce an enhanced genome annotation revealing cell-type specific expression of known and 16,872 novel (with respect to mouse Gencode version 10) isoforms (see isoformatlas.com).
3,052 downloads molecular biology
Recent methodological advances allowed the identification of an increasing number of RNA-binding proteins (RBPs) and their RNA-binding sites. Most of those methods rely, however, on capturing proteins associated to polyadenylated RNAs which neglects RBPs bound to non-adenylate RNA classes (tRNA, rRNA, pre-mRNA) as well as the vast majority of species that lack poly-A tails in their mRNAs (including all archea and bacteria). To overcome these limitations, we have developed a novel protocol, Phenol Toluol extraction (PTex), that does not rely on a specific RNA sequence or motif for isolation of cross-linked ribonucleoproteins (RNPs), but rather purifies them based entirely on their physicochemical properties. PTex captures RBPs that bind to RNA as short as 30 nt, RNPs directly from animal tissue and can be used to simplify complex workflows such as PAR-CLIP. Finally, we provide a first global RNA-bound proteome of human HEK293 cells and Salmonella Typhimurium as a bacterial species.
2,948 downloads molecular biology
The carboxy-terminal domain (CTD) of RNA polymerase (Pol) II is an intrinsically disordered low-complexity region that is critical for pre-mRNA transcription and processing. The CTD consists of hepta-amino acid repeats varying in number from 52 in humans to 26 in yeast. Here we report that human and yeast CTDs undergo cooperative liquid phase separation at increasing protein concentration, with the shorter yeast CTD forming less stable droplets. In human cells, truncation of the CTD to the length of the yeast CTD decreases Pol II clustering and chromatin association whereas CTD extension has the opposite effect. CTD droplets can incorporate intact Pol II and are dissolved by CTD phosphorylation with the transcription initiation factor IIH kinase CDK7. Together with published data, our results suggest that Pol II forms clusters/hubs at active genes through interactions between CTDs and with activators, and that CTD phosphorylation liberates Pol II enzymes from hubs for promoter escape and transcription elongation.
2,863 downloads molecular biology
The field of epitranscriptomics has undergone an enormous expansion in the last few years; however, a major limitation is the lack of generic methods to map RNA modifications transcriptome-wide. Here we show that using Oxford Nanopore Technologies, N6-methyladenosine (m6A) RNA modifications can be detected with high accuracy, in the form of systematic errors and decreased base-calling qualities. Our results open new avenues to investigate the universe of RNA modifications with single nucleotide resolution, in individual RNA molecules.
2,824 downloads molecular biology
Robert J Ihry, Kathleen A Worringer, Max R Salick, Elizabeth Frias, Dan Ho, Kraig Theriault, Sravya Kommineni, Julie Chen, Marie Sondey, Chaoyang Ye, Ranjit Randhawa, Tripti Kulkarni, Zinger Yang, Gregory McAllister, Carsten Russ, John Reece-Hoyes, William Forrester, Gregory R Hoffman, Ricardo Dolmetsch, Ajamete Kaykas
CRISPR/Cas9 has revolutionized our ability to engineer genomes and to conduct genome-wide screens in human cells. While some cell types are easily modified with Cas9, human pluripotent stem cells (hPSCs) poorly tolerate Cas9 and are difficult to engineer. Using a stable Cas9 cell line or transient delivery of ribonucleoproteins (RNPs) we achieved an average insertion or deletion efficiency greater than 80%. This high efficiency made it apparent that double strand breaks (DSBs) induced by Cas9 are toxic and kill most treated hPSCs. Cas9 toxicity creates an obstacle to the high-throughput use CRISPR/Cas9 for genome-engineering and screening in hPSCs. We demonstrated the toxic response is tp53-dependent and the toxic effect of tp53 severely reduces the efficiency of precise genome-engineering in hPSCs. Our results highlight that CRISPR-based therapies derived from hPSCs should proceed with caution. Following engineering, it is critical to monitor for tp53 function, especially in hPSCs which spontaneously acquire tp53 mutations.
2,810 downloads molecular biology
A recently published research article reported that the extreme halophile archaebacterium Natronobacterium gregoryi Argonaute enzyme (NgAgo) could cleave the cellular DNA under physiological temperature conditions in cell line and be implemented as an alternative to CRISPR/Cas9 genome editing technology. We assessed this claim in mouse zygotes for four loci (Sptb, Tet-1, Tet-2 and Tet-3) and in the human HEK293T cell line for the EMX1 locus. Over 100 zygotes were microinjected with nls-NgAgo-GK plasmid provided from Addgene and various concentrations of 5-phosphorylated guide DNA (gDNA) from 2.5 ng/microl to 50 ng/microl and cultured to blastocyst stage of development. The presence of indels was verified using T7 endonuclease 1 assay (T7E1) and Sanger sequencing. We reported no evidence of successful editing of the mouse genome. We then assessed the lack of editing efficiency in HEK293T cell line for the EMX1 endogenous locus by monitoring the NgAgo protein expression level and the editing efficiency by T7E1 assay and Sanger sequencing. We reported that the NgAgo protein was expressed from 8 hours to a maximum expression at 48 hours post-transfection, confirming the efficient delivery of the plasmid and the gDNA but no evidence of successful editing of EMX1 target in all transfected samples. Together our findings indicate that we failed to edit using NgAgo.
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!