Abstract
A high level of physical detail in a molecular model improves its ability to perform high accuracy simulations but can also significantly affect its complexity and computational cost. In some situations, it is worthwhile to add complexity to a model to capture properties of interest; in others, additional complexity is unnecessary and can make simulations computationally infeasible. In this work, we demonstrate the use of Bayesian inference for molecular model selection, using Monte Carlo sampling techniques accelerated with surrogate modeling to evaluate the Bayes factor evidence for different levels of complexity in the two-centered Lennard-Jones + quadrupole (2CLJQ) fluid model. Examining three nested levels of model complexity, we demonstrate that the use of variable quadrupole and bond length parameters in this model framework is justified only for some chemistries. Through this process, we also get detailed information about the distributions and correlation of parameter values, enabling improved parametrization and parameter analysis. We also show how the choice of parameter priors, which encode previous model knowledge, can have substantial effects on the selection of models, penalizing careless introduction of additional complexity. We detail the computational techniques used in this analysis, providing a roadmap for future applications of molecular model selection via Bayesian inference and surrogate modeling.
Original language | English |
---|---|
Pages (from-to) | 874-889 |
Number of pages | 16 |
Journal | Journal of Chemical Information and Modeling |
Volume | 62 |
Issue number | 4 |
DOIs | |
State | Published - Feb 28 2022 |
Externally published | Yes |
Funding
The authors declare the following competing financial interest(s): The Chodera laboratory receives or has received funding from multiple sources, including the National Institutes of Health, the National Science Foundation, the Parker Institute for Cancer Immunotherapy, Relay Therapeutics, Entasis Therapeutics, Silicon Therapeutics, EMD Serono (Merck KGaA), AstraZeneca, Vir Biotechnology, XtalPi, Foresite Labs, the Molecular Sciences Software Institute, the Starr Cancer Consortium, the Open Force Field Consortium, Cycle for Survival, a Louis V. Gerstner Young Investigator Award, and the Sloan Kettering Institute. A complete funding history for the Chodera lab can be found at http://choderalab.org/funding. J.D.C. is a current member of the Scientific Advisory Boards of OpenEye Scientific Software, Interline, and Redesign Science and holds equity interests in Interline and Redesign Science. M.R.S. is an Open Science Fellow for Roivant Sciences. S.B. is a director of Boothroyd Scientific Consulting Ltd. Acknowledgments We thank the Open Force Field Consortium for funding, including our industry partners as listed at the Open Force Field website, and Molecular Sciences Software Institute (MolSSI) for its support of the Open Force Field Initiative. We gratefully acknowledge all current and former members of the Open Force Field Initiative and the Open Force Field Scientific Advisory Board. Research reported in this publication was in part supported by National Institute of General Medical Sciences of the National Institutes of Health under award number R01GM132386, specifically partial support of O.C.M., M.R.S., and J.D.C. O.C.M., M.R.S., J.D.C., and J.F. acknowledge support from NSF CHE-1738975 for parts of the project. These findings are solely of the authors and do not necessarily represent the official views of the NIH or NSF.