Deep clustering of protein folding simulations

Debsindhu Bhowmik, Shang Gao, Michael T. Young, Arvind Ramanathan

Research output: Contribution to journalArticlepeer-review

60 Scopus citations

Abstract

Background: We examine the problem of clustering biomolecular simulations using deep learning techniques. Since biomolecular simulation datasets are inherently high dimensional, it is often necessary to build low dimensional representations that can be used to extract quantitative insights into the atomistic mechanisms that underlie complex biological processes. Results: We use a convolutional variational autoencoder (CVAE) to learn low dimensional, biophysically relevant latent features from long time-scale protein folding simulations in an unsupervised manner. We demonstrate our approach on three model protein folding systems, namely Fs-peptide (14 μs aggregate sampling), villin head piece (single trajectory of 125 μs) and β- β- α (BBA) protein (223 + 102 μs sampling across two independent trajectories). In these systems, we show that the CVAE latent features learned correspond to distinct conformational substates along the protein folding pathways. The CVAE model predicts, on average, nearly 89% of all contacts within the folding trajectories correctly, while being able to extract folded, unfolded and potentially misfolded states in an unsupervised manner. Further, the CVAE model can be used to learn latent features of protein folding that can be applied to other independent trajectories, making it particularly attractive for identifying intrinsic features that correspond to conformational substates that share similar structural features. Conclusions: Together, we show that the CVAE model can quantitatively describe complex biophysical processes such as protein folding.

Original languageEnglish
Article number484
JournalBMC Bioinformatics
Volume19
DOIs
StatePublished - Dec 21 2018

Funding

The authors would like to thank D. E. Shaw Research for providing access to the protein folding simulation trajectories of BBA and VHP. The authors also thank the MSMBuilder team for making their Fs-Peptide simulations available online. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of the manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This work has been supported in part by the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by the U.S. Department of Energy (DOE) and the National Cancer Institute (NCI) of the National Institutes of Health. This work was performed under the auspices of the U.S. Department of Energy by Argonne National Laboratory under Contract DE-AC02-06-CH11357, Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344, Los Alamos National Laboratory under Contract DE-AC5206NA25396, Oak Ridge National Laboratory under Contract DE-AC05-00OR22725, and Frederick National Laboratory for Cancer Research under Contract HHSN261200800001E. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. Publications costs were funded in part by the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by the U.S. Department of Energy (DOE) and the National Cancer Institute (NCI) of the National Institutes of Health and the Laboratory Director’s Research and Development Fund.

FundersFunder number
National Institutes of Health
U.S. Department of Energy
National Cancer Institute
Office of Science
Argonne National LaboratoryDE-AC02-06-CH11357
Lawrence Livermore National LaboratoryDE-AC52-07NA27344
Oak Ridge National LaboratoryDE-AC05-00OR22725
Los Alamos National LaboratoryDE-AC5206NA25396
Frederick National Laboratory for Cancer ResearchHHSN261200800001E

    Keywords

    • Conformational substates
    • Deep learning
    • Protein folding
    • Variational autoencoder

    Fingerprint

    Dive into the research topics of 'Deep clustering of protein folding simulations'. Together they form a unique fingerprint.

    Cite this