Deep generative model driven protein folding simulations

Heng Ma, Debsindhu Bhowmik, Hyungro Lee, Matteo Turilli, Michael Young, Shantenu Jha, Arvind Ramanathan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Significant progress in computer hardware and software have enabled molecular dynamics (MD) simulations to model complex biological phenomena such as protein folding. However, enabling MD simulations to access biologically relevant timescales (e.g., beyond milliseconds) still remains challenging. These limitations include (1) quantifying which set of states have already been (sufficiently) sampled in an ensemble of MD runs, and (2) identifying novel states from which simulations can be initiated to sample rare events (e.g., sampling folding events). With the recent success of deep learning and artificial intelligence techniques in analyzing large datasets, we posit that these techniques can also be used to adaptively guide MD simulations to model such complex biological phenomena. Leveraging our recently developed unsupervised deep learning technique to cluster protein folding trajectories into partially folded intermediates, we build an iterative workflow that enables our generative model to be coupled with all-atom MD simulations to fold small protein systems on emerging high performance computing platforms. We demonstrate our approach in folding Fs-peptide and the β-β- α (BBA) fold, FSD-EY. Our adaptive workflow enables us to achieve an overall root-mean squared deviation (RMSD) to the native state of 1.6 A and 4.4 A respectively for Fs-peptide and FSD-EY. We also highlight some emerging challenges in the context of designing scalable workflows when data intensive deep learning techniques are coupled to compute intensive MD simulations.

Original languageEnglish
Title of host publicationParallel Computing
Subtitle of host publicationTechnology Trends
EditorsIan Foster, Gerhard R. Joubert, Ludek Kucera, Wolfgang E. Nagel, Frans Peters
PublisherIOS Press BV
Pages45-55
Number of pages11
ISBN (Electronic)9781643680705
DOIs
StatePublished - 2020

Publication series

NameAdvances in Parallel Computing
Volume36
ISSN (Print)0927-5452
ISSN (Electronic)1879-808X

Funding

We thank Vivek Balasubramanian and Jumana Dakka for helpful discussions and early contributions. We also thank Chris Layton for helping set up our runs on the NVIDIA DGX-2 compute systems. We also acknowledge support by NSF DIBBS 1443054 and NSF RADICAL-Cybertools 1440677. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of the manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. We also acknowledge support by NSF DIBBS 1443054 and NSF RADICAL-Cybertools 1440677. This manuscript has been authored by UTBattelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of the manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05- 00OR22725

FundersFunder number
DOE Public Access Plan
United States Government
National Science FoundationDE-AC05-00OR22725, 1440677, 1443054
U.S. Department of Energy
Office of ScienceDE-AC05

    Keywords

    • Deep learning
    • Molecular dynamics
    • Protein folding
    • Workflows

    Fingerprint

    Dive into the research topics of 'Deep generative model driven protein folding simulations'. Together they form a unique fingerprint.

    Cite this