Abstract
Selecting optimal combinations of preprocessing methods is a major holdup for chemometric analysis. The analyst decides which method(s) to apply to the data, frequently by highly subjective or inefficient means, such as user experience or trial and error. Here, we present a user-friendly method using optimal experimental designs for selecting preprocessing transformations. We applied this strategy to optimize partial least square regression (PLSR) analysis of Stokes Raman spectra to quantify hydroxylammonium (0-0.5 M), nitric acid (0-1 M), and total nitrate (0-1.5 M) concentrations. The best PLSR model chosen by a determinant (D)-optimal design comprising 26 samples (i.e., combinations of preprocessing methods) was compared with PLSR models built with no preprocessing, a user-selected preprocessing method (i.e., trial and error), and a user-defined design strategy (576 samples). The D-optimal selection strategy improved PLSR prediction performance by more than 50% compared with the raw data and reduced the number of combinations by more than 95.5%.
Original language | English |
---|---|
Pages (from-to) | 7287-7296 |
Number of pages | 10 |
Journal | ACS Omega |
Volume | 7 |
Issue number | 8 |
DOIs | |
State | Published - Mar 1 2022 |
Funding
The authors wish to thank Erica Heinrich for assistance with the technical review of this manuscript. This work was supported by the Pu Supply Program at the US Department of Energy’s Oak Ridge National Laboratory. Funding for this program was provided by the Science Mission Directorate of the National Aeronautics and Space Administration and administered by the US Department of Energy, Office of Nuclear Energy, under contract DEAC05-00OR22725. This work used resources at the High Flux Isotope Reactor, a Department of Energy Office of Science User Facility operated by Oak Ridge National Laboratory. This work was also supported in part by the U.S. Department of Energy, Office of Science, Office of Workforce Development for Teachers and Scientists (WDTS), under the Science Undergraduate Laboratory Internship Program at Oak Ridge National Laboratory, administered by the Oak Ridge Institute for Science and Education. 238 This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doe-public-access-plan ). Acknowledgments