Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 70,836 bioRxiv papers from 309,140 authors.

Molecular insights from conformational ensembles via machine learning

By O. Fleetwood, M.A. Kasimova, A.M. Westerlund, Lucie Delemotte

Posted 07 Jul 2019
bioRxiv DOI: 10.1101/695254 (published DOI: 10.1016/j.bpj.2019.12.016)

Biomolecular simulations are intrinsically high dimensional and generate noisy datasets of ever increasing size. Extracting important features in the data is crucial for understanding the biophysical properties of molecular processes, but remains a big challenge. Machine learning (ML) provides powerful dimensionality reduction tools. However, such methods are often criticized to resemble black boxes with limited human-interpretable insight. We use methods from supervised and unsupervised ML to efficiently create interpretable maps of important features from molecular simulations. We benchmark the performance of several methods including neural networks, random forests and principal component analysis, using a toy model with properties reminiscent of macromolecular behavior. We then analyze three diverse biological processes: conformational changes within the soluble protein calmodulin, ligand binding to a G protein-coupled receptor and activation of an ion channel voltage-sensor domain, unravelling features critical for signal transduction, ligand binding and voltage sensing. This work demonstrates the usefulness of ML in understanding biomolecular states and demystifying complex simulations. STATEMENT OF SIGNIFICANCE Understanding how biomolecules function requires resolving the ensemble of structures they visit. Molecular dynamics simulations compute these ensembles and generate large amounts of data that can be noisy and need to be condensed for human interpretation. Machine learning methods are designed to process large amounts of data, but are often criticized for their black-box nature and have historically been modestly used in the analysis of biomolecular systems. We demonstrate how machine learning tools can provide an interpretable overview of important features in a simulation dataset. We develop a protocol to quickly perform data-driven analysis of molecular simulations. This protocol is applied to identify the molecular basis of ligand binding to a receptor and of voltage sensitivity of an ion channel.

Download data

  • Downloaded 938 times
  • Download rankings, all-time:
    • Site-wide: 9,230 out of 70,838
    • In biophysics: 269 out of 3,019
  • Year to date:
    • Site-wide: 1,890 out of 70,838
  • Since beginning of last month:
    • Site-wide: 723 out of 70,838

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)