Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19

A. Acharya, R. Agarwal, M. B. Baker, J. Baudry, D. Bhowmik, S. Boehm, K. G. Byler, S. Y. Chen, L. Coates, C. J. Cooper, O. Demerdash, I. Daidone, J. D. Eblen, S. Ellingson, S. Forli, J. Glaser, J. C. Gumbart, J. Gunnels, O. Hernandez, S. IrleD. W. Kneller, A. Kovalevsky, J. Larkin, T. J. Lawrence, S. Legrand, S. H. Liu, J. C. Mitchell, G. Park, J. M. Parks, A. Pavlova, L. Petridis, D. Poole, L. Pouchard, A. Ramanathan, D. M. Rogers, D. Santos-Martins, A. Scheinberg, A. Sedova, Y. Shen, J. C. Smith, M. D. Smith, C. Soto, A. Tsaris, M. Thavappiragasam, A. F. Tillack, J. V. Vermaas, V. Q. Vuong, J. Yin, S. Yoo, M. Zahran, L. Zanetti-Polzi

Research output: Contribution to journalArticlepeer-review

137 Scopus citations

Abstract

We present a supercomputer-driven pipeline for in silico drug discovery using enhanced sampling molecular dynamics (MD) and ensemble docking. Ensemble docking makes use of MD results by docking compound databases into representative protein binding-site conformations, thus taking into account the dynamic properties of the binding sites. We also describe preliminary results obtained for 24 systems involving eight proteins of the proteome of SARS-CoV-2. The MD involves temperature replica exchange enhanced sampling, making use of massively parallel supercomputing to quickly sample the configurational space of protein drug targets. Using the Summit supercomputer at the Oak Ridge National Laboratory, more than 1 ms of enhanced sampling MD can be generated per day. We have ensemble docked repurposing databases to 10 configurations of each of the 24 SARS-CoV-2 systems using AutoDock Vina. Comparison to experiment demonstrates remarkably high hit rates for the top scoring tranches of compounds identified by our ensemble approach. We also demonstrate that, using Autodock-GPU on Summit, it is possible to perform exhaustive docking of one billion compounds in under 24 h. Finally, we discuss preliminary results and planned improvements to the pipeline, including the use of quantum mechanical (QM), machine learning, and artificial intelligence (AI) methods to cluster MD trajectories and rescore docking poses.

Original languageEnglish
Pages (from-to)5832-5852
Number of pages21
JournalJournal of Chemical Information and Modeling
Volume60
Issue number12
DOIs
StatePublished - Dec 28 2020

Funding

This work was made possible in part by a grant of high-performance computing resources and technical support from the Alabama Supercomputer Authority to J.B. and K.B. J.C.G. was supported by the National Institute of Health under Grant No. NIH R01-AI148740. C.J.C. was supported by a National Science Foundation Graduate Research Fellowship under Grant No. 2017219379. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725 and National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231. This research was supported by the Cancer Research Informatics Shared Resource Facility of the University of Kentucky Markey Cancer Center (P30CA177558) and the University of Kentucky's Center for Computational Sciences (CCS) high-performance computing resources. Computer time on Summit was granted by the HPC Covid-19 Consortium. This work was made possible in part by a grant of high-performance computing resources and technical support from the Alabama Supercomputer Authority to J.B. and K.B. J.C.G. was supported by the National Institute of Health under Grant No. NIH R01-AI148740. C.J.C. was supported by a National Science Foundation Graduate Research Fellowship under Grant No. 2017219379. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725 and National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231. This research was supported by the Cancer Research Informatics Shared Resource Facility of the University of Kentucky Markey Cancer Center (P30CA177558) and the University of Kentucky’s Center for Computational Sciences (CCS) high-performance computing resources. Computer time on Summit was granted by the HPC Covid-19 Consortium.

FundersFunder number
HPC Covid-19 Consortium
National Institute of Health
U.S. Department of Energy Office of Science
National Science Foundation2017219379
National Science Foundation
National Institutes of Health
U.S. Department of EnergyDE-AC05-00OR22725, DE-AC02-05CH11231
U.S. Department of Energy
National Institute of Allergy and Infectious DiseasesR01AI148740
National Institute of Allergy and Infectious Diseases
Office of Science
University of Kentucky
Markey Cancer Center, University of KentuckyP30CA177558
Markey Cancer Center, University of Kentucky
National Energy Research Scientific Computing Center

    Fingerprint

    Dive into the research topics of 'Supercomputer-Based Ensemble Docking Drug Discovery Pipeline with Application to Covid-19'. Together they form a unique fingerprint.

    Cite this