Abstract
An accurate prediction of NMR chemical shifts at affordable computational cost is very important for different types of structural assignments in experimental studies. Density functional theory (DFT) and gauge-including atomic orbital (GIAO) are two of the most popular computational methods for NMR calculation, yet they often fail to resolve ambiguities in structural assignments. Here, we present a new method that uses machine learning (ML) techniques (DFT + ML) that significantly increases the accuracy of 13C/1H NMR chemical shift prediction for a variety of organic molecules. The input of the generalizable DFT + ML model contains two critical parts: one is a vector providing insights into chemical environments, which can be evaluated without knowing the exact geometry of the molecule; the other one is the DFT-calculated isotropic shielding constant. The DFT + ML model was trained with a data set containing 476 13C and 270 1H experimental chemical shifts. For the DFT methods used here, the root mean square deviations (RMSDs) for the errors between predicted and experimental 13C/1H chemical shifts can be as small as 2.10/0.18 ppm, which is much lower than those from simple DFT (5.54/0.25 ppm), or DFT + linear regression (LR) (4.77/0.23 ppm) approaches. It also has a smaller maximum absolute error than two previously proposed NMR-predicting ML models. The robustness of the DFT + ML model is tested on two classes of organic molecules (TIC10 and hyacinthacines), where the correct isomers were unambiguously assigned to the experimental ones. Overall, the DFT + ML model shows promise for structural assignments in a variety of systems, including stereoisomers, that are often challenging to determine experimentally.
Original language | English |
---|---|
Pages (from-to) | 3746-3754 |
Number of pages | 9 |
Journal | Journal of Chemical Information and Modeling |
Volume | 60 |
Issue number | 8 |
DOIs | |
State | Published - Aug 24 2020 |
Externally published | Yes |
Funding
Resources provided at the NCI National Facility systems at the Australian National University through the National Computational Merit Allocation Scheme supported by the Australian Government (Project id: v15). This work was partially supported (J.Z. and V.-A.G.) by the U.S. Department of Energy, Office of Science, Chemical Sciences, Geosciences, and Biosciences Division, Award #72353. P.G. acknowledges the Australian Government for an Australian International Postgraduate Award scholarship. J.Z. and V.-A.G. acknowledge Research Computing at PNNL for the ML analysis. Q.P. acknowledges the National Science Foundation of China (Grants 21890722, 21702109, and 11811530637), the Natural Science Foundation of Tianjin City (18JCYBJC21400), and the Fundamental Research Funds of the Central Universities (Nos. 63191515, 63196021, 63191523) for financial support. The views and opinions of the authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights.