Rxivist logo

Molecular insights from conformational ensembles via machine learning

By Oliver Fleetwood, M. A. Kasimova, A.M. Westerlund, Lucie Delemotte

Posted 07 Jul 2019
bioRxiv DOI: 10.1101/695254 (published DOI: 10.1016/j.bpj.2019.12.016)

Biomolecular simulations are intrinsically high dimensional and generate noisy datasets of ever increasing size. Extracting important features in the data is crucial for understanding the biophysical properties of molecular processes, but remains a big challenge. Machine learning (ML) provides powerful dimensionality reduction tools. However, such methods are often criticized to resemble black boxes with limited human-interpretable insight. We use methods from supervised and unsupervised ML to efficiently create interpretable maps of important features from molecular simulations. We benchmark the performance of several methods including neural networks, random forests and principal component analysis, using a toy model with properties reminiscent of macromolecular behavior. We then analyze three diverse biological processes: conformational changes within the soluble protein calmodulin, ligand binding to a G protein-coupled receptor and activation of an ion channel voltage-sensor domain, unravelling features critical for signal transduction, ligand binding and voltage sensing. This work demonstrates the usefulness of ML in understanding biomolecular states and demystifying complex simulations. STATEMENT OF SIGNIFICANCE Understanding how biomolecules function requires resolving the ensemble of structures they visit. Molecular dynamics simulations compute these ensembles and generate large amounts of data that can be noisy and need to be condensed for human interpretation. Machine learning methods are designed to process large amounts of data, but are often criticized for their black-box nature and have historically been modestly used in the analysis of biomolecular systems. We demonstrate how machine learning tools can provide an interpretable overview of important features in a simulation dataset. We develop a protocol to quickly perform data-driven analysis of molecular simulations. This protocol is applied to identify the molecular basis of ligand binding to a receptor and of voltage sensitivity of an ion channel.

Download data

  • Downloaded 1,127 times
  • Download rankings, all-time:
    • Site-wide: 9,433 out of 89,238
    • In biophysics: 290 out of 3,891
  • Year to date:
    • Site-wide: 11,123 out of 89,238
  • Since beginning of last month:
    • Site-wide: 27,247 out of 89,238

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)