Rxivist logo

Combining statistical and neural network approaches to derive energy functions for completely flexible protein backbone design

By Bin Huang, Yang Xu, Haiyan Liu

Posted 18 Jun 2019
bioRxiv DOI: 10.1101/673897

A designable protein backbone is one for which amino acid sequences that stably fold into it exist. To design such backbones, a general method is much needed for continuous sampling and optimization in the backbone conformational space without specific amino acid sequence information. The energy functions driving such sampling and optimization must faithfully recapitulate the characteristically coupled distributions of multiplexes of local and non-local conformational variables in designable backbones. It is also desired that the energy surfaces are continuous and smooth, with easily computable gradients. We combine statistical and neural network (NN) approaches to derive a model named SCUBA, standing for Side-Chain-Unspecialized-Backbone-Arrangement. In this approach, high-dimensional statistical energy surfaces learned from known protein structures are analytically represented as NNs. SCUBA is composed as a sum of NN terms describing local and non-local conformational energies, each NN term derived by first estimating the statistical energies in the corresponding multi-variable space via neighbor-counting (NC) with adaptive cutoffs, and then training the NN with the NC-estimated energies. To determine the relative weights of different energy terms, SCUBA-driven stochastic dynamics (SD) simulations of natural proteins are considered. As initial computational tests of SCUBA, we apply SD simulated annealing to automatically optimize artificially constructed polypeptide backbones of different fold classes. For a majority of the resulting backbones, structurally matching native backbones can be found with Dali Z-scores above 6 and less than 2 angstroms displacements of main chain atoms in aligned secondary structures. The results suggest that SCUBA-driven sampling and optimization can be a general tool for protein backbone design with complete conformational flexibility. In addition, the NC-NN approach can be generally applied to develop continuous, noise-filtered multi-variable statistical models from structural data. Linux executables to setup and run SCUBA SD simulations are publicly available (http://biocomp.ustc.edu.cn/servers/download_scuba.php). Interested readers may contact the authors for source code availability.

Download data

  • Downloaded 495 times
  • Download rankings, all-time:
    • Site-wide: 88,706
    • In bioinformatics: 7,468
  • Year to date:
    • Site-wide: 91,901
  • Since beginning of last month:
    • Site-wide: 174,091

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide