Using diverse potentials and scoring functions for the development of improved machine-learned models for protein–ligand affinity and docking pose prediction

Omar N.A. Demerdash

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

The advent of computational drug discovery holds the promise of significantly reducing the effort of experimentalists, along with monetary cost. More generally, predicting the binding of small organic molecules to biological macromolecules has far-reaching implications for a range of problems, including metabolomics. However, problems such as predicting the bound structure of a protein–ligand complex along with its affinity have proven to be an enormous challenge. In recent years, machine learning-based methods have proven to be more accurate than older methods, many based on simple linear regression. Nonetheless, there remains room for improvement, as these methods are often trained on a small set of features, with a single functional form for any given physical effect, and often with little mention of the rationale behind choosing one functional form over another. Moreover, it is not entirely clear why one machine learning method is favored over another. In this work, we endeavor to undertake a comprehensive effort towards developing high-accuracy, machine-learned scoring functions, systematically investigating the effects of machine learning method and choice of features, and, when possible, providing insights into the relevant physics using methods that assess feature importance. Here, we show synergism among disparate features, yielding adjusted R2 with experimental binding affinities of up to 0.871 on an independent test set and enrichment for native bound structures of up to 0.913. When purely physical terms that model enthalpic and entropic effects are used in the training, we use feature importance assessments to probe the relevant physics and hopefully guide future investigators working on this and other computational chemistry problems.

Original languageEnglish
Pages (from-to)1095-1123
Number of pages29
JournalJournal of Computer-Aided Molecular Design
Volume35
Issue number11
DOIs
StatePublished - Nov 2021

Funding

The author would like to thank Julie C. Mitchell for guidance. This work was funded through the Laboratory Directed Research and Development Program at Oak Ridge National Laboratory (LOIS ID: 9207). This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan ). The author would like to thank Julie C. Mitchell for guidance. This work was funded through the Laboratory Directed Research and Development Program at Oak Ridge National Laboratory (LOIS ID: 9207). This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doe-public-access-plan ).

FundersFunder number
DOE Public Access Plan
LOISDE-AC05-00OR22725, 9207
U.S. Department of Energy
Oak Ridge National Laboratory

    Keywords

    • Binding affinity
    • Binding-pose prediction
    • Docking
    • Machine learning
    • Rescoring

    Fingerprint

    Dive into the research topics of 'Using diverse potentials and scoring functions for the development of improved machine-learned models for protein–ligand affinity and docking pose prediction'. Together they form a unique fingerprint.

    Cite this