Multi-fidelity learning for interatomic potentials: low-level forces and high-level energies are all you need

  • Mitchell Messerly
  • , Sakib Matin
  • , Alice E.A. Allen
  • , Benjamin Nebgen
  • , Kipton Barros
  • , Justin S. Smith
  • , Nicholas Lubbers
  • , Richard Messerly

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

The promise of machine learning interatomic potentials (MLIPs) has led to an abundance of public quantum mechanical (QM) training datasets. The quality of an MLIP is directly limited by the accuracy of the energies and atomic forces in the training dataset. Unfortunately, most of these datasets are computed with relatively low-accuracy QM methods, e.g. density functional theory with a moderate basis set. Due to the increased computational cost of more accurate QM methods, e.g. coupled-cluster theory with a complete basis set (CBS) extrapolation, most high-accuracy datasets are much smaller and often do not contain atomic forces. The lack of high-accuracy atomic forces is quite troubling, as training with force data greatly improves the stability and quality of the MLIP compared to training to energy alone. Because most datasets are computed with a unique level of theory, traditional single-fidelity (SF) learning is not capable of leveraging the vast amounts of published QM data. In this study, we apply multi-fidelity learning (MFL) to train an MLIP to multiple QM datasets of different levels of accuracy, i.e. levels of fidelity. Specifically, we perform three test cases to demonstrate that MFL with both low-level forces and high-level energies yields an extremely accurate MLIP—far more accurate than a SF MLIP trained solely to high-level energies and almost as accurate as a SF MLIP trained directly to high-level energies and forces. Therefore, MFL greatly alleviates the need for generating large and expensive datasets containing high-accuracy atomic forces and allows for more effective training to existing high-accuracy energy-only datasets. Indeed, low-accuracy atomic forces and high-accuracy energies are all that are needed to achieve a high-accuracy MLIP with MFL.

Original languageEnglish
Article number035066
JournalMachine Learning: Science and Technology
Volume6
Issue number3
DOIs
StatePublished - Sep 30 2025

Funding

This manuscript has been authored in part by UT-Battelle, LLC, under Contract DE-AC05-00OR22725 with the U.S. Department of Energy (DOE).The U.S. government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for U.S. government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (. M M, S M, A E A A, B N, K B, N L, and R A M acknowledge support from the US Department of Energy (DOE), Office of Science, Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division under Triad National Security, LLC (‘Triad’) contract Grant 89233218CNA000001 (FWP: LANLE3F2 and LANLE8AN). M M gratefully acknowledges the resources of the Los Alamos National Laboratory (LANL) Computational Science summer student program.The work at LANL was supported by the LANL Laboratory Directed Research and Development (LDRD) Project 20230290ER. Work at LANL was performed in part at the Center for Nonlinear Studies and the Center for Integrated Nanotechnologies, a US DOE Office of Science user facility at LANL. This research used resources provided by the Darwin testbed at LANL which is funded by the Computational Systems and Software Environments subprogram of LANL’s Advanced Simulation and Computing program (NNSA/DOE). This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U S DOE under Contract No. DE-AC05-00OR22725. M M, S M, A E A A, B N, K B, N L, and R A M acknowledge support from the US Department of Energy (DOE), Office of Science, Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division under Triad National Security, LLC (‘Triad’) contract Grant 89233218CNA000001 (FWP: LANLE3F2 and LANLE8AN). M M gratefully acknowledges the resources of the Los Alamos National Laboratory (LANL) Computational Science summer student program.The work at LANL was supported by the LANL Laboratory Directed Research and Development (LDRD) Project 20230290ER. Work at LANL was performed in part at the Center for Nonlinear Studies and the Center for Integrated Nanotechnologies, a US DOE Office of Science user facility at LANL. This research used resources provided by the Darwin testbed at LANL which is funded by the Computational Systems and Software Environments subprogram of LANL’s Advanced Simulation and Computing program (NNSA/DOE). This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U S DOE under Contract No. DE-AC05-00OR22725. This manuscript has been authored in part by UT-Battelle, LLC, under Contract DE-AC05-00OR22725 with the U.S. Department of Energy (DOE).The U.S. government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for U.S. government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doe-public-access-plan ).

Keywords

  • computational chemistry
  • coupled-cluster theory
  • density functional theory
  • machine learning interatomic potentials
  • molecular dynamics
  • multi-task learning
  • neural networks

Fingerprint

Dive into the research topics of 'Multi-fidelity learning for interatomic potentials: low-level forces and high-level energies are all you need'. Together they form a unique fingerprint.

Cite this