Abstract
A critical step in structure-based drug discovery is predicting whether and how a candidate molecule binds to a model of a therapeutic target. However, substantial protein side chain movements prevent current screening methods, such as docking, from accurately predicting the ligand conformations and require expensive refinements to produce viable candidates. We present the development of a high-throughput and flexible ligand pose refinement workflow, called "tinyIFD". The main features of the workflow include the use of specialized high-throughput, small-system MD simulation code mdgx.cuda and an actively learning model zoo approach. We show the application of this workflow on a large test set of diverse protein targets, achieving 66% and 76% success rates for finding a crystal-like pose within the top-2 and top-5 poses, respectively. We also applied this workflow to the SARS-CoV-2 main protease (Mpro) inhibitors, where we demonstrate the benefit of the active learning aspect in this workflow.
Original language | English |
---|---|
Pages (from-to) | 3438-3447 |
Number of pages | 10 |
Journal | Journal of Chemical Information and Modeling |
Volume | 63 |
Issue number | 11 |
DOIs | |
State | Published - Jun 12 2023 |
Funding
This paper has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a non-exclusive, paid up, irrevocable, world-wide license to publish or reproduce the published form of the manuscript, or allow others to do so, for U.S. Government purposes. The DOE will provide public access to these results in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doe-public-access-plan ). The authors thank Dilipkumar N. Asthagiri for constructive discussions. CARES act funding to the Oak Ridge Leadership Computing Facility (OLCF) through DOE ASCR in support of this research is also acknowledged, as is the Laboratory Directed Research and Development Program at Oak Ridge National Laboratory (ORNL). This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725.