Abstract
Machine learning (ML) models, if trained to data sets of high-fidelity quantum simulations, produce accurate and efficient interatomic potentials. Active learning (AL) is a powerful tool to iteratively generate diverse data sets. In this approach, the ML model provides an uncertainty estimate along with its prediction for each new atomic configuration. If the uncertainty estimate passes a certain threshold, then the configuration is included in the data set. Here we develop a strategy to more rapidly discover configurations that meaningfully augment the training data set. The approach, uncertainty-driven dynamics for active learning (UDD-AL), modifies the potential energy surface used in molecular dynamics simulations to favor regions of configuration space for which there is large model uncertainty. The performance of UDD-AL is demonstrated for two AL tasks: sampling the conformational space of glycine and sampling the promotion of proton transfer in acetylacetone. The method is shown to efficiently explore the chemically relevant configuration space, which may be inaccessible using regular dynamical sampling at target temperature conditions.
Original language | English |
---|---|
Pages (from-to) | 230-239 |
Number of pages | 10 |
Journal | Nature Computational Science |
Volume | 3 |
Issue number | 3 |
DOIs | |
State | Published - Mar 2023 |
Externally published | Yes |
Funding
K.B., N.L., R.M., S.T. and B.N. acknowledge support from the United States Department of Energy (US DOE), Office of Science, Basic Energy Sciences, Chemical Sciences, Geosciences, and Biosciences Division under Triad National Security, LLC (‘Triad’) contract grant no. 89233218CNA000001 (FWP: LANLE3F2). M.K. and Y.W.L. acknowledge support from the Los Alamos National Laboratory (LANL) Directed Research and Development funds. M.K. acknowledges financial support from the Director’s Postdoctoral Fellowship at LANL. This work was performed in part at the Center for Nonlinear Studies and the Center for Integrated Nanotechnology, a US DOE and Office of Basic Energy Sciences user facility. This research used resources provided by the LANL Institutional Computing Program, which is supported by the US DOE National Nuclear Security Administration under contract no. 89233218CNA000001. We also acknowledge the CCS-7 Darwin cluster at LANL for additional computing resources.
Funders | Funder number |
---|---|
Center for Nonlinear Studies | |
U.S. Department of Energy | |
Office of Science | |
Basic Energy Sciences | |
National Nuclear Security Administration | |
Los Alamos National Laboratory | |
Chemical Sciences, Geosciences, and Biosciences Division | 89233218CNA000001, LANLE3F2 |