Machine learning-based prediction of enzyme substrate scope: Application to bacterial nitrilases

Zhongyu Mou, Jason Eakes, Connor J. Cooper, Carmen M. Foster, Robert F. Standaert, Mircea Podar, Mitchel J. Doktycz, Jerry M. Parks

Research output: Contribution to journalArticlepeer-review

32 Scopus citations

Abstract

Predicting the range of substrates accepted by an enzyme from its amino acid sequence is challenging. Although sequence- and structure-based annotation approaches are often accurate for predicting broad categories of substrate specificity, they generally cannot predict which specific molecules will be accepted as substrates for a given enzyme, particularly within a class of closely related molecules. Combining targeted experimental activity data with structural modeling, ligand docking, and physicochemical properties of proteins and ligands with various machine learning models provides complementary information that can lead to accurate predictions of substrate scope for related enzymes. Here we describe such an approach that can predict the substrate scope of bacterial nitrilases, which catalyze the hydrolysis of nitrile compounds to the corresponding carboxylic acids and ammonia. Each of the four machine learning models (logistic regression, random forest, gradient-boosted decision trees, and support vector machines) performed similarly (average ROC = 0.9, average accuracy = ~82%) for predicting substrate scope for this dataset, although random forest offers some advantages. This approach is intended to be highly modular with respect to physicochemical property calculations and software used for structural modeling and docking.

Original languageEnglish
Pages (from-to)336-347
Number of pages12
JournalProteins: Structure, Function and Genetics
Volume89
Issue number3
DOIs
StatePublished - Mar 2021

Funding

National Science Foundation, Grant/Award Number: 2017219379; Oak Ridge National Laboratory, Grant/Award Number: DE‐AC05‐00OR22725 Funding information This work was supported by Laboratory‐Directed Research and Development funds from Oak Ridge National Laboratory (ORNL), which is managed by UT‐Battelle, LLC for the U.S. Department of Energy under Contract No. DE‐AC05‐00OR22725. This work used resources of the Compute and Data Environment for Science (CADES) at ORNL. CJC was supported by a National Science Foundation Graduate Research Fellowship under Grant No. 2017219379. This work was supported by Laboratory-Directed Research and Development funds from Oak Ridge National Laboratory (ORNL), which is managed by UT-Battelle, LLC for the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This work used resources of the Compute and Data Environment for Science (CADES) at ORNL. CJC was supported by a National Science Foundation Graduate Research Fellowship under Grant No. 2017219379.

Keywords

  • enzyme specificity
  • functional annotation
  • machine learning
  • modular approach
  • substrate scope

Fingerprint

Dive into the research topics of 'Machine learning-based prediction of enzyme substrate scope: Application to bacterial nitrilases'. Together they form a unique fingerprint.

Cite this