Abstract
In the field of computational chemistry, predicting bond dissociation energies (BDEs) presents well-known challenges, particularly due to the multireference character of reactive systems. Many chemical reactions involve configurations where single-reference methods fall short, as the electronic structure can significantly change during bond breaking. As generating training data for partially broken bonds is a challenging task, even state-of-the-art reactive machine learning interatomic potentials (MLIPs) often fail to predict reliable BDEs and smooth dissociation curves. By contrast, simple and inexpensive physics-based models, such as the well-established Morse potential, do not suffer from any such limitations. This work leverages the Morse potential to improve reactive MLIPs by augmenting the training data set with inexpensive Morse data along the dissociation pathways. This physics-constrained data augmentation (PCDA) approach results in MLIPs with smooth bond dissociation curves as well as near coupled-cluster level BDEs, all without requiring any expensive multireference quantum mechanical calculations. A case study for methane combustion demonstrates how the PCDA approach can improve an existing reactive MLIP, namely, ANI-1xnr. Not only are the BDEs and bond dissociation curves for all radicals and molecules significantly improved compared to ANI-1xnr but the PCDA-trained MLIP retains the reliability of ANI-1xnr when performing reactive molecular dynamics simulations.
Original language | English |
---|---|
Pages (from-to) | 1198-1210 |
Number of pages | 13 |
Journal | Journal of Chemical Information and Modeling |
Volume | 65 |
Issue number | 3 |
DOIs | |
State | Published - Feb 10 2025 |
Funding
The authors thank Hans Lischka, Sergei Tretiak, Nicholas Lubbers, Kipton Barros and Lorena Alzate-Vargas for insightful discussions regarding physics-constrained machine learning and multireference character of bond dissociations. B.T.N., A.E.A.A., B.W.H., S.M., and R.A.M. acknowledge support from the US Department of Energy, Office of Science, Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division under Triad National Security, LLC (\u201CTriad\u201D) contract grant 89233218CNA000001 (FWP: LANLE3F2). The work at LANL was supported by the LANL Directed Research and Development Funds 20230435ECR. Work at LANL was performed in part at the Center for Nonlinear Studies and the Center for Integrated Nanotechnologies, a US Department of Energy Office of Science user facility at LANL. This research used resources provided by the LANL Institutional Computing Program. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. L.G.F.dS. expresses gratitude for the Helen DeVitt Jones Graduate Fellowship offered by the Texas Tech University Graduate School. US Department of Energy, Office of Science, Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division under Triad National Security, LLC (\u201CTriad\u201D) contract grant 89233218CNA000001. LANL Directed Research and Development Funds 20230435ECR. Open access funded by Max Planck Society. US Department of Energy, Office of Science, Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division under Triad National Security, LLC (\u201CTriad\u201D) contract grant 89233218CNA000001. LANL Directed Research and Development Funds 20230435ECR. The authors thank Hans Lischka, Sergei Tretiak, Nicholas Lubbers, Kipton Barros and Lorena Alzate-Vargas for insightful discussions regarding physics-constrained machine learning and multireference character of bond dissociations. B.T.N., A.E.A.A., B.W.H., S.M., and R.A.M. acknowledge support from the US Department of Energy, Office of Science, Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division under Triad National Security, LLC (\u201CTriad\u201D) contract grant 89233218CNA000001 (FWP: LANLE3F2). The work at LANL was supported by the LANL Directed Research and Development Funds 20230435ECR. Work at LANL was performed in part at the Center for Nonlinear Studies and the Center for Integrated Nanotechnologies, a US Department of Energy Office of Science user facility at LANL. This research used resources provided by the LANL Institutional Computing Program. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. L.G.F.dS. expresses gratitude for the Helen DeVitt Jones Graduate Fellowship offered by the Texas Tech University Graduate School.