Rxivist logo

DeCoDe: degenerate codon design for complete protein-coding DNA libraries

By Tyler C. Shimko, Polly Fordyce, Yaron Orenstein

Posted 17 Oct 2019
bioRxiv DOI: 10.1101/809004

Motivation: High-throughput protein screening is a critical technique for dissecting and designing protein function. Libraries for these assays can be created through a number of means, including targeted or random mutagenesis of a template protein sequence or direct DNA synthesis. However, mutagenic library construction methods often yield vastly more non-functional than functional variants and, despite advances in large-scale DNA synthesis, individual synthesis of each desired DNA template is often prohibitively expensive. Consequently, many protein screening libraries rely on the use of degenerate codons (DCs), mixtures of DNA bases incorporated at specific positions during DNA synthesis, to generate highly diverse protein variant pools from only a few low-cost synthesis reactions. However, selecting DCs for sets of sequences that covary at multiple positions dramatically increases the difficulty of designing a DC library and leads to the creation of many undesired variants that can quickly outstrip screening capacity. Results: We introduce a novel algorithm for total DC library optimization, DeCoDe, based on integer linear programming. DeCoDe significantly outperforms state-of-the-art DC optimization algorithms and scales well to more than a hundred proteins sharing complex patterns of covariation (e.g. the lab-derived avGFP lineage). Moreover, DeCoDe is, to our knowledge, the first DC design algorithm with the capability to encode mixed-length protein libraries. We anticipate DeCoDe to be broadly useful for a variety of library generation problems, ranging from protein engineering attempts that leverage mutual information to the reconstruction of ancestral protein states.

Download data

  • Downloaded 256 times
  • Download rankings, all-time:
    • Site-wide: 49,790 out of 76,920
    • In bioinformatics: 5,579 out of 7,431
  • Year to date:
    • Site-wide: 23,355 out of 76,920
  • Since beginning of last month:
    • Site-wide: 23,059 out of 76,920

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News