Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 70,317 bioRxiv papers from 307,048 authors.

RamaNet: Computational de novo helical protein backbone design using a long short-term memory generative adversarial neural network

By Sari Sabban, Mikhail Markovsky

Posted 14 Jun 2019
bioRxiv DOI: 10.1101/671552

The ability to perform de novo protein design will allow researchers to expand the pool and variety of available proteins, by designing synthetic structures computationally they can utilise more structures than is available in the Protein Data Bank, design structures that are not found in nature, or direct the design of proteins to acquire a specific desired structure. While some researchers attempt to design proteins from first physical and thermodynamic principals, we decided to attempt to test whether it is possible to perform de novo helical protein design of just the backbone statistically using machine learning by building a model that used a long short-term memory generative adversarial neural network architecture. The LSTM based GAN model used only the ϕ and ψ angles of each residue from an augmented dataset of only helical protein structures. Though the network’s generated backbone structures were not perfect, they were idealised and evaluated post generation where the non-ideal structures were filtered out and the adequate structures kept. The results were successful in developing a logical, rigid, compact, helical protein backbone topology. This paper is a proof of concept that shows it is possible to generate a novel helical backbone topology using an LSTM-GAN architecture using only the ϕ and ψ angles as features. The next step is to attempt to use these backbone topologies and sequence design them to form complete protein structures. Author summary This research project stemmed from the desire to expand the pool of available protein structures that can be used as a scaffold in computational vaccine design since the number of structures available from the Protein Data Bank was not sufficient to allow for great diversity and increase the probability of grafting a target motif onto a protein scaffold. Since a protein structure’s backbone can be defined by the ϕ and ψ angles of each amino acid in the polypeptide and can effectively translate a protein’s 3D structure into a table of numbers, and since protein structures are not random, this numerical representation of protein structures can be used to train a neural network to mathematically generalise what a protein structure is, and therefore use this generalisation to generate new protein structures. Instead of using all proteins in the Protein Data Bank a curated dataset was used encompassing protein structures with specific characteristics that will, theoretically, allow them to be easily evaluated computationally and chemically. This paper details how a trained neural network was able to successfully generate logical helical protein backbone structures.

Download data

  • Downloaded 839 times
  • Download rankings, all-time:
    • Site-wide: 10,902 out of 70,330
    • In bioinformatics: 1,831 out of 6,889
  • Year to date:
    • Site-wide: 3,194 out of 70,330
  • Since beginning of last month:
    • Site-wide: 982 out of 70,330

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)