Rxivist logo

Targeted optimization of regulatory DNA sequences with neural editing architectures

By Anvita Gupta, Anshul Kundaje

Posted 25 Jul 2019
bioRxiv DOI: 10.1101/714402

Targeted optimizing of existing DNA sequences for useful properties, has the potential to enable several synthetic biology applications from modifying DNA to treat genetic disorders to designing regulatory elements to fine tune context-specific gene expression. Current approaches for targeted genome editing are largely based on prior biological knowledge or ad-hoc rules. Few if any machine learning approaches exist for targeted optimization of regulatory DNA sequences. Here, we propose a novel generative neural network architecture for targeted DNA sequence editing – the EDA architecture – consisting of an encoder, decoder, and analyzer. We showcase the use of EDA to optimize regulatory DNA sequences to bind to the transcription factor SPI1. Compared to other state-of-the-art approaches such as a textual variational autoencoder and rule-based editing, EDA significantly improves predicted binding of SPI1 of genomic sequences with the minimal set of edits. We also use EDA to design regulatory elements with optimized grammars of CREB1 binding sites that can tune reporter expression levels as measured by massively parallel reporter assays (MPRA). We analyze the properties of the binding sites in the edited sequences and find patterns that are consistent with previously reported grammatical rules which tie gene expression to CRE binding site density, spacing and affinity.

Download data

  • Downloaded 724 times
  • Download rankings, all-time:
    • Site-wide: 17,747 out of 84,639
    • In bioinformatics: 2,682 out of 8,115
  • Year to date:
    • Site-wide: 13,650 out of 84,639
  • Since beginning of last month:
    • Site-wide: 10,229 out of 84,639

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)