Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 89,581 bioRxiv papers from 383,801 authors.
Most downloaded bioRxiv papers, all time
in category synthetic biology
816 results found. For more information, click each entry to expand.
15,715 downloads synthetic biology
Methods of altering wild populations are most useful when inherently limited to local geographic areas. Here we describe a novel form of gene drive based on the introduction of multiple copies of an engineered 'daisy' sequence into repeated elements of the genome. Each introduced copy encodes guide RNAs that target one or more engineered loci carrying the CRISPR nuclease gene and the desired traits. When organisms encoding a drive system are released into the environment, each generation of mating with wild-type organisms will reduce the average number of the guide RNA elements per 'daisyfield' organism by half, serving as a generational clock. The loci encoding the nuclease and payload will exhibit drive only as long as a single copy remains, placing an inherent limit on the extent of spread.
13,665 downloads synthetic biology
DNA is an attractive medium to store digital information. Here, we report a storage strategy, called DNA Fountain, that is highly robust and approaches the information capacity per nucleotide. Using our approach, we stored a full computer operating system, movie, and other files with a total of 2.14x10^6 bytes in DNA oligos and perfectly retrieved the information from a sequencing coverage equivalent of a single tile of Illumina sequencing. We also tested a process that can allow 2.18x10^15 retrievals using the original DNA sample and were able to perfectly decode the data. Finally, we explored the limit of our architecture in terms of bytes per molecules and obtained a perfect retrieval from a density of 215Petabyte/gram of DNA, orders of magnitudes higher than previous techniques.
13,386 downloads synthetic biology
In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation. In biology, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Learning the natural distribution of evolutionary protein sequence variation is a logical step toward predictive and generative modeling for biology. To this end we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million sequences spanning evolutionary diversity. The resulting model maps raw sequences to representations of biological properties without labels or prior domain knowledge. The learned representation space organizes sequences at multiple levels of biological granularity from the biochemical to proteomic levels. Unsupervised learning recovers information about protein structure: secondary structure and residue-residue contacts can be identified by linear projections from the learned representations. Training language models on full sequence diversity rather than individual protein families increases recoverable information about secondary structure. The unsupervised models can be adapted with supervision from quantitative mutagenesis data to predict variant activity. Predictions from sequences alone are comparable to results from a state-of-the-art model of mutational effects that uses evolutionary and structurally derived features.
9,222 downloads synthetic biology
Rational protein engineering requires a holistic understanding of protein function. Here, we apply deep learning to unlabelled amino acid sequences to distill the fundamental features of a protein into a statistical representation that is semantically rich and structurally, evolutionarily, and biophysically grounded. We show that the simplest models built on top of this unified representation (UniRep) are broadly applicable and generalize to unseen regions of sequence space. Our data-driven approach reaches near state-of-the-art or superior performance predicting stability of natural and de novo designed proteins as well as quantitative function of molecularly diverse mutants. UniRep further enables two orders of magnitude cost savings in a protein engineering task. We conclude UniRep is a versatile protein summary that can be applied across protein engineering informatics.
8,008 downloads synthetic biology
Modern synthetic biology depends on the manufacture of large DNA constructs from libraries of genes, regulatory elements or other genetic parts. Type IIS-restriction enzyme-dependent DNA assembly methods (e.g., Golden Gate) enable rapid one-pot, ordered, multi-fragment DNA assembly, facilitating the generation of high-complexity constructs. The order of assembly of genetic parts is determined by the ligation of flanking Watson-Crick base-paired overhangs. The ligation of mismatched overhangs leads to erroneous assembly, and the need to avoid such pairings has typically been accomplished by using small sets of empirically vetted junction pairs, limiting the number of parts that can be joined in a single reaction. Here, we report the use of a comprehensive method for profiling end-joining ligation fidelity and bias to predict highly accurate sets of connections for ligation-based DNA assembly methods. This data set allows quantification of sequence-dependent ligation efficiency and identification of mismatch-prone pairings. The ligation profile accurately predicted junction fidelity in ten-fragment Golden Gate assembly reactions, and enabled efficient assembly of a lac cassette from up to 24-fragments in a single reaction. Application of the ligation fidelity profile to inform choice of junctions thus enables highly flexible assembly design, with >20 fragments in a single reaction.
7,491 downloads synthetic biology
Biology offers compelling proof that macroscopic "living materials" can emerge from reactions between diffusing biomolecules. Here, we show that molecular self-organization could be a similarly powerful approach for engineering functional synthetic materials. We introduce a programmable DNA-hydrogel that produces tunable patterns at the centimeter length scale. We generate these patterns by implementing chemical reaction networks through synthetic DNA complexes, embedding the complexes in hydrogel, and triggering with locally applied input DNA strands. We first demonstrate ring pattern formation around a circular input cavity and show that the ring width and intensity can be predictably tuned. Then, we create patterns of increasing complexity, including concentric rings and non-isotropic patterns. Finally, we show "destructive" and "constructive" interference patterns, by combining several ring-forming modules in the gel and triggering them from multiple sources. We further show that computer simulations based on the reaction-diffusion model can predict and inform the programming of target patterns.
7,174 downloads synthetic biology
Inheritance-biasing “gene drives” may be capable of spreading genomic alterations made in laboratory organisms through wild populations. We previously considered the potential for RNA-guided gene drives based on the versatile CRISPR/Cas9 genome editing system to serve as a general method of altering populations. Here we report molecularly contained gene drive constructs in the yeast Saccharomyces cerevisiae that are typically copied at rates above 99% when mated to wild yeast. We successfully targeted both non-essential and essential genes, showed that the inheritance of an unrelated “cargo” gene could be biased by an adjacent drive, and constructed a drive capable of overwriting and reversing changes made by a previous drive. Our results demonstrate that RNA-guided gene drives are capable of efficiently biasing inheritance when mated to wild-type organisms over successive generations.
6,009 downloads synthetic biology
Proteins---molecular machines that underpin all biological life---are of significant therapeutic and industrial value. Directed evolution is a high-throughput experimental approach for improving protein function, but has difficulty escaping local maxima in the fitness landscape. Here, we investigate how supervised learning in a closed loop with DNA synthesis and high-throughput screening can be used to improve protein design. Using the green fluorescent protein (GFP) as an illustrative example, we demonstrate the opportunities and challenges of generating training datasets conducive to selecting strongly generalizing models. With prospectively designed wet lab experiments, we then validate that these models can generalize to unseen regions of the fitness landscape, even when constrained to explore combinations of non-trivial mutations. Taken together, this suggests a hybrid optimization strategy for protein design in which a predictive model is used to explore difficult-to-access but promising regions of the fitness landscape that directed evolution can then exploit at scale.
6,002 downloads synthetic biology
We present here an approach for engineering evolving DNA barcodes in living cells. The methodology entails using a homing guide RNA (hgRNA) scaffold that directs the Cas9-hgRNA complex to target the DNA locus of the hgRNA itself. We show that this homing CRISPR-Cas9 system acts as an expressed genetic barcode that diversifies its sequence and that the rate of diversification can be controlled in cultured cells. We further evaluate these barcodes in cultured cell populations and show that they can record lineage history and and that their RNA can be assayed as single molecules in situ. This integrated approach will have wide ranging applications, such as in deep lineage tracing, cellular barcoding, molecular recording, dissecting cancer biology, and connectome mapping.
5,822 downloads synthetic biology
To extend the frontier of genome editing and enable the radical redesign of mammalian genomes, we developed a set of dead-Cas9 base editor (dBE) variants that allow editing at tens of thousands of loci per cell by overcoming the cell death associated with DNA double-strand breaks (DSBs) and single-strand breaks (SSBs). We used a set of gRNAs targeting repetitive elements – ranging in target copy number from about 31 to 124,000 per cell. dBEs enabled survival after large-scale base editing, allowing targeted mutations at up to ~13,200 and ~2610 loci in 293T and human induced pluripotent stem cells (hiPSCs), respectively, three orders of magnitude greater than previously recorded. These dBEs can overcome current on-target mutation and toxicity barriers that prevent cell survival after large-scale genome engineering. One Sentence Summary Base editing with reduced DNA nicking allows for the simultaneous editing of >10,000 loci in human cells.
5,780 downloads synthetic biology
Alejandro Chavez, Jonathan Scheiman, Suhani Vora, Benjamin W Pruitt, Marcelle Tuttle, Eswar Iyer, Samira Kiani, Christopher D Guzman, Daniel J. Wiegand, Dimtry Ter-Ovanesyan, Jonathan L Braff, Noah Davidsohn, Ron Weiss, John Aach, James J. Collins, George M. Church
The RNA-guided bacterial nuclease Cas9 can be reengineered as a programmable transcription factor by a series of changes to the Cas9 protein in addition to the fusion of a transcriptional activation domain (AD). However, the modest levels of gene activation achieved by current Cas9 activators have limited their potential applications. Here we describe the development of an improved transcriptional regulator through the rational design of a tripartite activator, VP64-p65-Rta (VPR), fused to Cas9. We demonstrate its utility in activating expression of endogenous coding and non-coding genes, targeting several genes simultaneously and stimulating neuronal differentiation of induced pluripotent stem cells (iPSCs).
5,700 downloads synthetic biology
Therapeutic antibody optimization is time and resource intensive, largely because it requires low-throughput screening (103 variants) of full-length IgG in mammalian cells, typically resulting in only a few optimized leads. Here, we use deep learning to interrogate and predict antigen-specificity from a massively diverse sequence space to identify globally optimized antibody variants. Using a mammalian display platform and the therapeutic antibody trastuzumab, rationally designed site-directed mutagenesis libraries are introduced by CRISPR/Cas9-mediated homology-directed repair (HDR). Screening and deep sequencing of relatively small libraries (104) produced high quality data capable of training deep neural networks that accurately predict antigen-binding based on antibody sequence. Deep learning is then used to predict millions of antigen binders from an in silico library of ~108 variants, where experimental testing of 30 randomly selected variants showed all 30 retained antigen specificity. The full set of in silico predicted binders is then subjected to multiple developability filters, resulting in thousands of highly-optimized lead candidates. With its scalability and capacity to interrogate high-dimensional protein sequence space, deep learning offers great potential for antibody engineering and optimization.
5,206 downloads synthetic biology
RNA-guided gene drive elements could address many ecological problems by altering the traits of wild organisms, but the likelihood of global spread tremendously complicates ethical development and use. Here we detail a localized form of CRISPR-based gene drive composed of genetic elements arranged in a daisy-chain such that each element drives the next. "Daisy drive" systems can duplicate any effect achievable using an equivalent global drive system, but their capacity to spread is limited by the successive loss of non-driving elements from the base of the chain. Releasing daisy drive organisms constituting a small fraction of the local wild population can drive a useful genetic element to local fixation for a wide range of fitness parameters without resulting in global spread. We additionally report numerous highly active guide RNA sequences sharing minimal homology that may enable evolutionary stable daisy drive as well as global CRISPR-based gene drive. Daisy drives could simplify decision-making and promote ethical use by enabling local communities to decide whether, when, and how to alter local ecosystems.
4,906 downloads synthetic biology
Protein engineering has enormous academic and industrial potential. However, it is limited by the lack of experimental assays that are consistent with the design goal and sufficiently high-throughput to find rare, enhanced variants. Here we introduce a machine learning-guided paradigm that can use as few as 24 functionally assayed mutant sequences to build an accurate virtual fitness landscape and screen ten million sequences via in silico directed evolution. As demonstrated in two highly dissimilar proteins, avGFP and TEM-1 β-lactamase, top candidates from a single round are diverse and as active as engineered mutants obtained from previous multi-year, high-throughput efforts. Because it distills information from both global and local sequence landscapes, our model approximates protein function even before receiving experimental data, and generalizes from only single mutations to propose high-functioning epistatically non-trivial designs. With reproducible >500% improvements in activity from a single assay in a 96-well plate, we demonstrate the strongest generalization observed in machine-learning guided protein design to date. Taken together, our approach enables efficient use of resource intensive high-fidelity assays without sacrificing throughput. By encouraging alignment with endpoint objectives, low-N design will accelerate engineered proteins into the fermenter, field, and clinic.
4,738 downloads synthetic biology
DNA is an emerging storage medium for digital data but its adoption is hampered by limitations of phosphoramidite chemistry, which was developed for single-base accuracy required for biological functionality. Here, we establish a de novo enzymatic DNA synthesis strategy designed from the bottom-up for information storage. We harness a template-independent DNA polymerase for controlled synthesis of sequences with user-defined information content. We demonstrate retrieval of 144-bits, including addressing, from perfectly synthesized DNA strands using batch-processed Illumina and real-time Oxford Nanopore sequencing. We then develop a codec for data retrieval from populations of diverse but imperfectly synthesized DNA strands, each with a ~30% error tolerance. With this codec, we experimentally validate a kilobyte-scale design which stores 1 bit per nucleotide. Simulations of the codec support reliable and robust storage of information for large-scale systems. This work paves the way for alternative synthesis and sequencing strategies to advance information storage in DNA.
4,135 downloads synthetic biology
Predicting the impact of cis-regulatory sequence on gene expression is a foundational challenge for biology. We combine polysome profiling of hundreds of thousands of randomized 5′ UTRs with deep learning to build a predictive model that relates human 5′ UTR sequence to translation. Together with a genetic algorithm, we use the model to engineer new 5′ UTRs that accurately target specified levels of ribosome loading, providing the ability to tune sequences for optimal protein expression. We show that the same approach can be extended to chemically modified RNA, an important feature for applications in mRNA therapeutics and synthetic biology. We test 35,000 truncated human 5′ UTRs and 3,577 naturally-occurring variants and show that the model accurately predicts ribosome loading of these sequences. Finally, we provide evidence of 47 SNVs associated with human diseases that cause a significant change in ribosome loading and thus a plausible molecular basis for disease.
4,014 downloads synthetic biology
The ability to longitudinally track and record molecular events in vivo would provide a unique opportunity to monitor signaling dynamics within cellular niches and to identify critical factors in orchestrating cellular behavior. We present a self-contained analog memory device that enables the recording of molecular stimuli in the form of DNA mutations in human cells. The memory unit consists of a self-targeting guide RNA (stgRNA) cassette that repeatedly directs Streptococcus pyogenes Cas9 nuclease activity towards the DNA that encodes the stgRNA, thereby enabling localized, continuous DNA mutagenesis as a function of stgRNA expression. We analyze the temporal sequence evolution dynamics of stgRNAs containing 20, 30 and 40 nucleotide SDSes (Specificity Determining Sequences) and create a population-based recording metric that conveys information about the duration and/or intensity of stgRNA activity. By expressing stgRNAs from engineered, inducible RNA polymerase (RNAP) III promoters, we demonstrate programmable and multiplexed memory storage in human cells triggered by doxycycline and isopropyl β-D-1-thiogalactopyranoside (IPTG). Finally, we show that memory units encoded in human cells implanted in mice are able to record lipopolysaccharide (LPS)-induced acute inflammation over time. This tool, which we call Mammalian Synthetic Cellular Recorder Integrating Biological Events (mSCRIBE), provides a unique strategy for investigating cell biology in vivo and in situ and may drive further applications that leverage continuous evolution of targeted DNA sequences in mammalian cells.
3,900 downloads synthetic biology
Recent reports have suggested that CRISPR-based gene drives are unlikely to invade wild populations due to drive-resistant alleles that prevent cutting. Here we develop mathematical models based on existing empirical data to explicitly test this assumption. We show that although resistance prevents drive systems from spreading to fixation in large populations, even the least effective systems reported to date are highly invasive. Releasing a small number of organisms often causes invasion of the local population, followed by invasion of additional populations connected by very low gene flow rates. Examining the effects of mitigating factors including standing variation, inbreeding, and family size revealed that none of these prevent invasion in realistic scenarios. Highly effective drive systems are predicted to be even more invasive. Contrary to the National Academies report on gene drive, our results suggest that standard drive systems should not be developed nor field-tested in regions harboring the host organism.
3,888 downloads synthetic biology
The many successes of synthetic biology have come in a manner largely different from those in other engineering disciplines; in particular, without well-characterized and simplified prototyping environments to play a role analogous to wind-tunnels in aerodynamics and breadboards in electrical engineering. However, as the complexity of synthetic circuits increases, the benefits?in cost savings and design cycle time?of a more traditional engineering approach can be significant. We have recently developed an in vitro ?breadboard? prototyping platform based on E. coli cell extract that allows biocircuits to operate in an environment considerably simpler than but functionally similar to in vivo. The simplicity of this system makes it a promising tool for rapid biocircuit design and testing, as well as for probing fundamental aspects of gene circuit operation normally masked by cellular complexity. In this work we characterize the cell-free breadboard using real-time and simultaneous measurements of transcriptional and translational activities of a small set of reporter genes and a transcriptional activation cascade. We determine the effects of promoter strength, gene concentration, and nucleoside triphosphate concentration on biocircuit properties, and we isolate the specific contributions of essential biomolecular resources?core RNA polymerase and ribosomes?to overall performance. Importantly, we show how limits on resources, particularly those involved in translation, are manifested as reduced expression in the presence of orthogonal genes that serve as additional loads on the system.
3,853 downloads synthetic biology
Cellular processes are carried out by many interacting genes and their study and optimization requires multiple levers by which they can be independently controlled. The most common method is via a genetically-encoded sensor that responds to a small molecule (an "inducible system"). However, these sensors are often suboptimal, exhibiting high background expression and low dynamic range. Further, using multiple sensors in one cell is limited by cross-talk and the taxing of cellular resources. Here, we have developed a directed evolution strategy to simultaneously select for less background, high dynamic range, increased sensitivity, and low crosstalk. Libraries of the regulatory protein and output promoter are built based on random and rationally-guided mutations. This is applied to generate a set of 12 high-performance sensors, which exhibit >100-fold induction with low background and cross-reactivity. These are combined to build a single "sensor array" and inserted into the genomes of E. coli MG1655 (wild-type), DH10B (cloning), and BL21 (protein expression). These "Marionette" strains allow for the independent control of gene expression using 2,4-diacetylphophloroglucinol (DAPG), cuminic acid (Cuma), 3-oxohexanoyl-homoserine lactone (OC6), vanillic acid (Van), isopropyl β-D-1-thiogalactopyranoside (IPTG), anhydrotetracycline (aTc), L-arabinose (Ara), choline chloride (Cho), naringenin (Nar), 3,4-dihydroxybenzoic acid (DHBA), sodium salicylate (Sal), and 3-hydroxytetradecanoyl-homoserine lactone (OHC14).
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!