Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 70,317 bioRxiv papers from 307,048 authors.
The ability to perform de novo protein design will allow researchers to expand the pool and variety of available proteins, by designing synthetic structures computationally they can utilise more structures than is available in the Protein Data Bank, design structures that are not found in nature, or direct the design of proteins to acquire a specific desired structure. While some researchers attempt to design proteins from first physical and thermodynamic principals, we decided to attempt to test whether it is possible to perform de novo helical protein design of just the backbone statistically using machine learning by building a model that used a long short-term memory generative adversarial neural network architecture. The LSTM based GAN model used only the ϕ and ψ angles of each residue from an augmented dataset of only helical protein structures. Though the network’s generated backbone structures were not perfect, they were idealised and evaluated post generation where the non-ideal structures were filtered out and the adequate structures kept. The results were successful in developing a logical, rigid, compact, helical protein backbone topology. This paper is a proof of concept that shows it is possible to generate a novel helical backbone topology using an LSTM-GAN architecture using only the ϕ and ψ angles as features. The next step is to attempt to use these backbone topologies and sequence design them to form complete protein structures. Author summary This research project stemmed from the desire to expand the pool of available protein structures that can be used as a scaffold in computational vaccine design since the number of structures available from the Protein Data Bank was not sufficient to allow for great diversity and increase the probability of grafting a target motif onto a protein scaffold. Since a protein structure’s backbone can be defined by the ϕ and ψ angles of each amino acid in the polypeptide and can effectively translate a protein’s 3D structure into a table of numbers, and since protein structures are not random, this numerical representation of protein structures can be used to train a neural network to mathematically generalise what a protein structure is, and therefore use this generalisation to generate new protein structures. Instead of using all proteins in the Protein Data Bank a curated dataset was used encompassing protein structures with specific characteristics that will, theoretically, allow them to be easily evaluated computationally and chemically. This paper details how a trained neural network was able to successfully generate logical helical protein backbone structures.
- Downloaded 839 times
- Download rankings, all-time:
- Site-wide: 10,902 out of 70,330
- In bioinformatics: 1,831 out of 6,889
- Year to date:
- Site-wide: 3,194 out of 70,330
- Since beginning of last month:
- Site-wide: 982 out of 70,330
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!