Improving Bond Dissociations of Reactive Machine Learning Potentials through Physics-Constrained Data Augmentation

Luan G. Luan, Benjamin T. Nebgen, Alice E.A. Allen, Brenden W. Hamilton, Sakib Matin, Justin S. Smith, Richard A. Messerly

Research output: Contribution to journalArticlepeer-review

Abstract

In the field of computational chemistry, predicting bond dissociation energies (BDEs) presents well-known challenges, particularly due to the multireference character of reactive systems. Many chemical reactions involve configurations where single-reference methods fall short, as the electronic structure can significantly change during bond breaking. As generating training data for partially broken bonds is a challenging task, even state-of-the-art reactive machine learning interatomic potentials (MLIPs) often fail to predict reliable BDEs and smooth dissociation curves. By contrast, simple and inexpensive physics-based models, such as the well-established Morse potential, do not suffer from any such limitations. This work leverages the Morse potential to improve reactive MLIPs by augmenting the training data set with inexpensive Morse data along the dissociation pathways. This physics-constrained data augmentation (PCDA) approach results in MLIPs with smooth bond dissociation curves as well as near coupled-cluster level BDEs, all without requiring any expensive multireference quantum mechanical calculations. A case study for methane combustion demonstrates how the PCDA approach can improve an existing reactive MLIP, namely, ANI-1xnr. Not only are the BDEs and bond dissociation curves for all radicals and molecules significantly improved compared to ANI-1xnr but the PCDA-trained MLIP retains the reliability of ANI-1xnr when performing reactive molecular dynamics simulations.

Original languageEnglish
Pages (from-to)1198-1210
Number of pages13
JournalJournal of Chemical Information and Modeling
Volume65
Issue number3
DOIs
StatePublished - Feb 10 2025

Funding

The authors thank Hans Lischka, Sergei Tretiak, Nicholas Lubbers, Kipton Barros and Lorena Alzate-Vargas for insightful discussions regarding physics-constrained machine learning and multireference character of bond dissociations. B.T.N., A.E.A.A., B.W.H., S.M., and R.A.M. acknowledge support from the US Department of Energy, Office of Science, Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division under Triad National Security, LLC (\u201CTriad\u201D) contract grant 89233218CNA000001 (FWP: LANLE3F2). The work at LANL was supported by the LANL Directed Research and Development Funds 20230435ECR. Work at LANL was performed in part at the Center for Nonlinear Studies and the Center for Integrated Nanotechnologies, a US Department of Energy Office of Science user facility at LANL. This research used resources provided by the LANL Institutional Computing Program. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. L.G.F.dS. expresses gratitude for the Helen DeVitt Jones Graduate Fellowship offered by the Texas Tech University Graduate School. US Department of Energy, Office of Science, Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division under Triad National Security, LLC (\u201CTriad\u201D) contract grant 89233218CNA000001. LANL Directed Research and Development Funds 20230435ECR. Open access funded by Max Planck Society. US Department of Energy, Office of Science, Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division under Triad National Security, LLC (\u201CTriad\u201D) contract grant 89233218CNA000001. LANL Directed Research and Development Funds 20230435ECR. The authors thank Hans Lischka, Sergei Tretiak, Nicholas Lubbers, Kipton Barros and Lorena Alzate-Vargas for insightful discussions regarding physics-constrained machine learning and multireference character of bond dissociations. B.T.N., A.E.A.A., B.W.H., S.M., and R.A.M. acknowledge support from the US Department of Energy, Office of Science, Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division under Triad National Security, LLC (\u201CTriad\u201D) contract grant 89233218CNA000001 (FWP: LANLE3F2). The work at LANL was supported by the LANL Directed Research and Development Funds 20230435ECR. Work at LANL was performed in part at the Center for Nonlinear Studies and the Center for Integrated Nanotechnologies, a US Department of Energy Office of Science user facility at LANL. This research used resources provided by the LANL Institutional Computing Program. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. L.G.F.dS. expresses gratitude for the Helen DeVitt Jones Graduate Fellowship offered by the Texas Tech University Graduate School.

Fingerprint

Dive into the research topics of 'Improving Bond Dissociations of Reactive Machine Learning Potentials through Physics-Constrained Data Augmentation'. Together they form a unique fingerprint.

Cite this