Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 52,519 bioRxiv papers from 243,473 authors.
Most downloaded bioRxiv papers, all time
in category synthetic biology
494 results found. For more information, click each entry to expand.
11,013 downloads synthetic biology
DNA is an attractive medium to store digital information. Here, we report a storage strategy, called DNA Fountain, that is highly robust and approaches the information capacity per nucleotide. Using our approach, we stored a full computer operating system, movie, and other files with a total of 2.14x10^6 bytes in DNA oligos and perfectly retrieved the information from a sequencing coverage equivalent of a single tile of Illumina sequencing. We also tested a process that can allow 2.18x10^15 retrievals using the original DNA sample and were able to perfectly decode the data. Finally, we explored the limit of our architecture in terms of bytes per molecules and obtained a perfect retrieval from a density of 215Petabyte/gram of DNA, orders of magnitudes higher than previous techniques.
8,508 downloads synthetic biology
Methods of altering wild populations are most useful when inherently limited to local geographic areas. Here we describe a novel form of gene drive based on the introduction of multiple copies of an engineered 'daisy' sequence into repeated elements of the genome. Each introduced copy encodes guide RNAs that target one or more engineered loci carrying the CRISPR nuclease gene and the desired traits. When organisms encoding a drive system are released into the environment, each generation of mating with wild-type organisms will reduce the average number of the guide RNA elements per 'daisyfield' organism by half, serving as a generational clock. The loci encoding the nuclease and payload will exhibit drive only as long as a single copy remains, placing an inherent limit on the extent of spread.
6,858 downloads synthetic biology
Inheritance-biasing “gene drives” may be capable of spreading genomic alterations made in laboratory organisms through wild populations. We previously considered the potential for RNA-guided gene drives based on the versatile CRISPR/Cas9 genome editing system to serve as a general method of altering populations. Here we report molecularly contained gene drive constructs in the yeast Saccharomyces cerevisiae that are typically copied at rates above 99% when mated to wild yeast. We successfully targeted both non-essential and essential genes, showed that the inheritance of an unrelated “cargo” gene could be biased by an adjacent drive, and constructed a drive capable of overwriting and reversing changes made by a previous drive. Our results demonstrate that RNA-guided gene drives are capable of efficiently biasing inheritance when mated to wild-type organisms over successive generations.
6,012 downloads synthetic biology
In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation. In biology, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Learning the natural distribution of evolutionary protein sequence variation is a logical step toward predictive and generative modeling for biology. To this end we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million sequences spanning evolutionary diversity. The resulting model maps raw sequences to representations of biological properties without labels or prior domain knowledge. The learned representation space organizes sequences at multiple levels of biological granularity from the biochemical to proteomic levels. Learning recovers information about protein structure: secondary structure and residue-residue contacts can be extracted by linear projections from learned representations. With small amounts of labeled data, the ability to identify tertiary contacts is further improved. Learning on full sequence diversity rather than individual protein families increases recoverable information about secondary structure. We show the networks generalize by adapting them to variant activity prediction from sequences only, with results that are comparable to a state-of-the-art variant predictor that uses evolutionary and structurally derived features.
5,729 downloads synthetic biology
We present here an approach for engineering evolving DNA barcodes in living cells. The methodology entails using a homing guide RNA (hgRNA) scaffold that directs the Cas9-hgRNA complex to target the DNA locus of the hgRNA itself. We show that this homing CRISPR-Cas9 system acts as an expressed genetic barcode that diversifies its sequence and that the rate of diversification can be controlled in cultured cells. We further evaluate these barcodes in cultured cell populations and show that they can record lineage history and and that their RNA can be assayed as single molecules in situ. This integrated approach will have wide ranging applications, such as in deep lineage tracing, cellular barcoding, molecular recording, dissecting cancer biology, and connectome mapping.
5,430 downloads synthetic biology
Alejandro Chavez, Jonathan Scheiman, Suhani Vora, Benjamin W Pruitt, Marcelle Tuttle, Eswar Iyer, Samira Kiani, Christopher D Guzman, Daniel J Wiegand, Dimtry Ter-Ovanesyan, Jonathan L Braff, Noah Davidsohn, Ron Weiss, John Aach, Collins J Collins, George M Church
The RNA-guided bacterial nuclease Cas9 can be reengineered as a programmable transcription factor by a series of changes to the Cas9 protein in addition to the fusion of a transcriptional activation domain (AD). However, the modest levels of gene activation achieved by current Cas9 activators have limited their potential applications. Here we describe the development of an improved transcriptional regulator through the rational design of a tripartite activator, VP64-p65-Rta (VPR), fused to Cas9. We demonstrate its utility in activating expression of endogenous coding and non-coding genes, targeting several genes simultaneously and stimulating neuronal differentiation of induced pluripotent stem cells (iPSCs).
5,257 downloads synthetic biology
Modern synthetic biology depends on the manufacture of large DNA constructs from libraries of genes, regulatory elements or other genetic parts. Type IIS-restriction enzyme-dependent DNA assembly methods (e.g., Golden Gate) enable rapid one-pot, ordered, multi-fragment DNA assembly, facilitating the generation of high-complexity constructs. The order of assembly of genetic parts is determined by the ligation of flanking Watson-Crick base-paired overhangs. The ligation of mismatched overhangs leads to erroneous assembly, and the need to avoid such pairings has typically been accomplished by using small sets of empirically vetted junction pairs, limiting the number of parts that can be joined in a single reaction. Here, we report the use of a comprehensive method for profiling end-joining ligation fidelity and bias to predict highly accurate sets of connections for ligation-based DNA assembly methods. This data set allows quantification of sequence-dependent ligation efficiency and identification of mismatch-prone pairings. The ligation profile accurately predicted junction fidelity in ten-fragment Golden Gate assembly reactions, and enabled efficient assembly of a lac cassette from up to 24-fragments in a single reaction. Application of the ligation fidelity profile to inform choice of junctions thus enables highly flexible assembly design, with >20 fragments in a single reaction.
4,885 downloads synthetic biology
RNA-guided gene drive elements could address many ecological problems by altering the traits of wild organisms, but the likelihood of global spread tremendously complicates ethical development and use. Here we detail a localized form of CRISPR-based gene drive composed of genetic elements arranged in a daisy-chain such that each element drives the next. "Daisy drive" systems can duplicate any effect achievable using an equivalent global drive system, but their capacity to spread is limited by the successive loss of non-driving elements from the base of the chain. Releasing daisy drive organisms constituting a small fraction of the local wild population can drive a useful genetic element to local fixation for a wide range of fitness parameters without resulting in global spread. We additionally report numerous highly active guide RNA sequences sharing minimal homology that may enable evolutionary stable daisy drive as well as global CRISPR-based gene drive. Daisy drives could simplify decision-making and promote ethical use by enabling local communities to decide whether, when, and how to alter local ecosystems.
4,507 downloads synthetic biology
To extend the frontier of genome editing and enable the radical redesign of mammalian genomes, we developed a set of dead-Cas9 base editor (dBE) variants that allow editing at tens of thousands of loci per cell by overcoming the cell death associated with DNA double-strand breaks (DSBs) and single-strand breaks (SSBs). We used a set of gRNAs targeting repetitive elements - ranging in target copy number from about 31 to 124,000 per cell. dBEs enabled survival after large-scale base editing, allowing targeted mutations at up to ~13,200 and ~2610 loci in 293T and human induced pluripotent stem cells (hiPSCs), respectively, three orders of magnitude greater than previously recorded. These dBEs can overcome current on-target mutation and toxicity barriers that prevent cell survival after large-scale genome engineering.
3,898 downloads synthetic biology
Proteins---molecular machines that underpin all biological life---are of significant therapeutic and industrial value. Directed evolution is a high-throughput experimental approach for improving protein function, but has difficulty escaping local maxima in the fitness landscape. Here, we investigate how supervised learning in a closed loop with DNA synthesis and high-throughput screening can be used to improve protein design. Using the green fluorescent protein (GFP) as an illustrative example, we demonstrate the opportunities and challenges of generating training datasets conducive to selecting strongly generalizing models. With prospectively designed wet lab experiments, we then validate that these models can generalize to unseen regions of the fitness landscape, even when constrained to explore combinations of non-trivial mutations. Taken together, this suggests a hybrid optimization strategy for protein design in which a predictive model is used to explore difficult-to-access but promising regions of the fitness landscape that directed evolution can then exploit at scale.
3,832 downloads synthetic biology
DNA is an emerging storage medium for digital data but its adoption is hampered by limitations of phosphoramidite chemistry, which was developed for single-base accuracy required for biological functionality. Here, we establish a de novo enzymatic DNA synthesis strategy designed from the bottom-up for information storage. We harness a template-independent DNA polymerase for controlled synthesis of sequences with user-defined information content. We demonstrate retrieval of 144-bits, including addressing, from perfectly synthesized DNA strands using batch-processed Illumina and real-time Oxford Nanopore sequencing. We then develop a codec for data retrieval from populations of diverse but imperfectly synthesized DNA strands, each with a ~30% error tolerance. With this codec, we experimentally validate a kilobyte-scale design which stores 1 bit per nucleotide. Simulations of the codec support reliable and robust storage of information for large-scale systems. This work paves the way for alternative synthesis and sequencing strategies to advance information storage in DNA.
3,797 downloads synthetic biology
The ability to longitudinally track and record molecular events in vivo would provide a unique opportunity to monitor signaling dynamics within cellular niches and to identify critical factors in orchestrating cellular behavior. We present a self-contained analog memory device that enables the recording of molecular stimuli in the form of DNA mutations in human cells. The memory unit consists of a self-targeting guide RNA (stgRNA) cassette that repeatedly directs Streptococcus pyogenes Cas9 nuclease activity towards the DNA that encodes the stgRNA, thereby enabling localized, continuous DNA mutagenesis as a function of stgRNA expression. We analyze the temporal sequence evolution dynamics of stgRNAs containing 20, 30 and 40 nucleotide SDSes (Specificity Determining Sequences) and create a population-based recording metric that conveys information about the duration and/or intensity of stgRNA activity. By expressing stgRNAs from engineered, inducible RNA polymerase (RNAP) III promoters, we demonstrate programmable and multiplexed memory storage in human cells triggered by doxycycline and isopropyl β-D-1-thiogalactopyranoside (IPTG). Finally, we show that memory units encoded in human cells implanted in mice are able to record lipopolysaccharide (LPS)-induced acute inflammation over time. This tool, which we call Mammalian Synthetic Cellular Recorder Integrating Biological Events (mSCRIBE), provides a unique strategy for investigating cell biology in vivo and in situ and may drive further applications that leverage continuous evolution of targeted DNA sequences in mammalian cells.
3,628 downloads synthetic biology
Recent reports have suggested that CRISPR-based gene drives are unlikely to invade wild populations due to drive-resistant alleles that prevent cutting. Here we develop mathematical models based on existing empirical data to explicitly test this assumption. We show that although resistance prevents drive systems from spreading to fixation in large populations, even the least effective systems reported to date are highly invasive. Releasing a small number of organisms often causes invasion of the local population, followed by invasion of additional populations connected by very low gene flow rates. Examining the effects of mitigating factors including standing variation, inbreeding, and family size revealed that none of these prevent invasion in realistic scenarios. Highly effective drive systems are predicted to be even more invasive. Contrary to the National Academies report on gene drive, our results suggest that standard drive systems should not be developed nor field-tested in regions harboring the host organism.
3,586 downloads synthetic biology
The many successes of synthetic biology have come in a manner largely different from those in other engineering disciplines; in particular, without well-characterized and simplified prototyping environments to play a role analogous to wind-tunnels in aerodynamics and breadboards in electrical engineering. However, as the complexity of synthetic circuits increases, the benefits?in cost savings and design cycle time?of a more traditional engineering approach can be significant. We have recently developed an in vitro ?breadboard? prototyping platform based on E. coli cell extract that allows biocircuits to operate in an environment considerably simpler than but functionally similar to in vivo. The simplicity of this system makes it a promising tool for rapid biocircuit design and testing, as well as for probing fundamental aspects of gene circuit operation normally masked by cellular complexity. In this work we characterize the cell-free breadboard using real-time and simultaneous measurements of transcriptional and translational activities of a small set of reporter genes and a transcriptional activation cascade. We determine the effects of promoter strength, gene concentration, and nucleoside triphosphate concentration on biocircuit properties, and we isolate the specific contributions of essential biomolecular resources?core RNA polymerase and ribosomes?to overall performance. Importantly, we show how limits on resources, particularly those involved in translation, are manifested as reduced expression in the presence of orthogonal genes that serve as additional loads on the system.
3,351 downloads synthetic biology
Cellular processes are carried out by many interacting genes and their study and optimization requires multiple levers by which they can be independently controlled. The most common method is via a genetically-encoded sensor that responds to a small molecule (an "inducible system"). However, these sensors are often suboptimal, exhibiting high background expression and low dynamic range. Further, using multiple sensors in one cell is limited by cross-talk and the taxing of cellular resources. Here, we have developed a directed evolution strategy to simultaneously select for less background, high dynamic range, increased sensitivity, and low crosstalk. Libraries of the regulatory protein and output promoter are built based on random and rationally-guided mutations. This is applied to generate a set of 12 high-performance sensors, which exhibit >100-fold induction with low background and cross-reactivity. These are combined to build a single "sensor array" and inserted into the genomes of E. coli MG1655 (wild-type), DH10B (cloning), and BL21 (protein expression). These "Marionette" strains allow for the independent control of gene expression using 2,4-diacetylphophloroglucinol (DAPG), cuminic acid (Cuma), 3-oxohexanoyl-homoserine lactone (OC6), vanillic acid (Van), isopropyl β-D-1-thiogalactopyranoside (IPTG), anhydrotetracycline (aTc), L-arabinose (Ara), choline chloride (Cho), naringenin (Nar), 3,4-dihydroxybenzoic acid (DHBA), sodium salicylate (Sal), and 3-hydroxytetradecanoyl-homoserine lactone (OHC14).
3,220 downloads synthetic biology
Rational protein engineering requires a holistic understanding of protein function. Here, we apply deep learning to unlabelled amino acid sequences to distill the fundamental features of a protein into a statistical representation that is semantically rich and structurally, evolutionarily, and biophysically grounded. We show that the simplest models built on top of this unified representation (UniRep) are broadly applicable and generalize to unseen regions of sequence space. Our data-driven approach reaches near state-of-the-art or superior performance predicting stability of natural and de novo designed proteins as well as quantitative function of molecularly diverse mutants. UniRep further enables two orders of magnitude cost savings in a protein engineering task. We conclude UniRep is a versatile protein summary that can be applied across protein engineering informatics.
2,946 downloads synthetic biology
Directed evolution is a powerful approach for engineering biomolecules and understanding adaptation. However, experimental strategies for directed evolution are notoriously low-throughput, limiting access to demanding functions, multiple functions in parallel, and the study of molecular evolution in replicate. Here, we report OrthoRep, a yeast orthogonal DNA polymerase-plasmid pair that stably mutates ~100,000-fold faster than the host genome in vivo, exceeding error thresholds of genomic replication that lead to single-generation extinction. User-defined genes in OrthoRep continuously and rapidly evolve through serial passaging, a highly scalable process. Using OrthoRep, we evolved drug resistant malarial DHFRs 90 times and uncovered a more complex fitness landscape than previously realized. We find rare fitness peaks that resist the maximum soluble concentration of the antimalarial pyrimethamine (these resistant variants support growth at pyrimethamine concentrations >40,000-fold higher than the wild-type enzyme can tolerate) and also find that epistatic interactions direct adaptive trajectories to convergent outcomes. OrthoRep enables a new paradigm of routine, high-throughput evolution of biomolecular and cellular function.
2,911 downloads synthetic biology
Cellular barcoding using nuclease-induced genetic mutations is an effective approach that is emerging for recording biological information, including developmental lineages. We have previously introduced the homing CRISPR system as a promising methodology for generating such barcodes with scalable diversity and without crosstalk. Here, we present a mouse line (MARC1) with multiple genomically-integrated and heritable homing guide RNAs (hgRNAs). We determine the genomic locations of these hgRNAs, their activity profiles during gestation, and the diversity of their mutants. We apply the line for unique barcoding of mouse embryos and differential barcoding of embryonic tissues. We conclude that this mouse line can address the unique challenges associated with in vivo barcoding in mammalian model organisms and is thus an enabling platform for recording and lineage tracing applications in a mammalian model system.
2,782 downloads synthetic biology
Predicting the impact of cis-regulatory sequence on gene expression is a foundational challenge for biology. We combine polysome profiling of hundreds of thousands of randomized 5′ UTRs with deep learning to build a predictive model that relates human 5′ UTR sequence to translation. Together with a genetic algorithm, we use the model to engineer new 5′ UTRs that accurately target specified levels of ribosome loading, providing the ability to tune sequences for optimal protein expression. We show that the same approach can be extended to chemically modified RNA, an important feature for applications in mRNA therapeutics and synthetic biology. We test 35,000 truncated human 5′ UTRs and 3,577 naturally-occurring variants and show that the model accurately predicts ribosome loading of these sequences. Finally, we provide evidence of 47 SNVs associated with human diseases that cause a significant change in ribosome loading and thus a plausible molecular basis for disease.
2,619 downloads synthetic biology
RNA-based regulation, such as RNA interference, and CRISPR/Cas transcription factors (CRISPR-TFs), can enable scalable synthetic gene circuits and the modulation of endogenous networks but have yet to be integrated together. Here, we combined multiple mammalian RNA regulatory strategies, including RNA triple helix structures, introns, microRNAs, and ribozymes, with Cas9-based CRISPR-TFs and Cas6/Csy4-based RNA processing in human cells. We describe three complementary strategies for expressing functional gRNAs from transcripts generated by RNA polymerase II (RNAP II) promoters while allowing the harboring gene to be translated. These architectures enable the multiplexed expression of proteins and multiple gRNAs from a single compact transcript for efficient modulation of synthetic constructs and endogenous human promoters. We used these regulatory tools to implement tunable synthetic gene circuits, including multi-stage transcriptional cascades. Finally, we show that Csy4 can rewire regulatory connections in RNA-dependent gene circuits with multiple outputs and feedback loops to achieve complex functional behaviors. This multiplexable toolkit will be valuable for the construction of scalable gene circuits and the perturbation of natural regulatory networks in human cells for basic biology, therapeutic, and synthetic-biology applications.
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!