Protein conformational states—a first principles Bayesian method

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Automated identification of protein conformational states from simulation of an ensemble of structures is a hard problem because it requires teaching a computer to recognize shapes. We adapt the naïve Bayes classifier from the machine learning community for use on atom-to-atom pairwise contacts. The result is an unsupervised learning algorithm that samples a ‘distribution’ over potential classification schemes. We apply the classifier to a series of test structures and one real protein, showing that it identifies the conformational transition with > 95% accuracy in most cases. A nontrivial feature of our adaptation is a new connection to information entropy that allows us to vary the level of structural detail without spoiling the categorization. This is confirmed by comparing results as the number of atoms and time-samples are varied over 1.5 orders of magnitude. Further, the method’s derivation from Bayesian analysis on the set of inter-atomic contacts makes it easy to understand and extend to more complex cases.

Original languageEnglish
Article number1242
Pages (from-to)1-12
Number of pages12
JournalEntropy
Volume22
Issue number11
DOIs
StatePublished - Nov 2020

Funding

Funding: Research sponsored by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory (ORNL). This research used resources of the Oak Ridge Leadership Computing Facility at ORNL. ORNL is managed by UT-Battelle, LLC, for the US Department of Energy under contract DE-AC05-00OR22725. National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA; [email protected] † This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Keywords

  • Bayesian clustering
  • Bernoulli mixture
  • Unsupervised classification

Fingerprint

Dive into the research topics of 'Protein conformational states—a first principles Bayesian method'. Together they form a unique fingerprint.

Cite this