Unbiased prediction errors for partial least squares regression models: Choosing a representative error estimator for process monitoring
Skou, Peter B; Tonolini, Margherita; Eskildsen, Carl Emil Aae; Berg, Frans van den; Rasmussen, Morten Arendt
Peer reviewed, Journal article
Published version
Date
2023Metadata
Show full item recordCollections
- Artikler / Articles [1516]
- Publikasjoner fra CRIStin [2616]
Original version
Journal of Near Infrared Spectroscopy. 2023, 31 (4), 186-195. 10.1177/09670335231173139Abstract
Partial least squares (PLS) regression is widely used to predict chemical analytes from spectroscopic data, thus reducing the need for expensive and time-consuming wet chemical reference analysis in industrial process monitoring. However, predictions via PLS by definition carry sample-specific errors, and estimation of these errors is essential for correct interpretation of results. To increase trust in PLS regression-based predictions, reliable prediction error estimates must be reported. This can be achieved by determining realistic sample-specific prediction errors using an unbiased mean squared prediction error estimate. This work provides a guide for estimating sample-specific prediction errors, showing the importance of choosing an appropriate error estimator prior to deploying PLS models for industrial applications. We reviewed recent and established methods for estimating the sample-specific prediction error and test them through simulation studies. The methods were subsequently applied for estimating prediction errors in two real-life datasets from the food ingredients industry, where near-infrared spectroscopy was used to quantify i) urea in process water and ii) individual protein concentrations in ultrafiltration retentates from a protein fractionation process. Both the simulations and real data examples showed that the mean squared error of calibration is always a downward biased estimator. Although leave-one-out-cross-validation performed surprisingly well in the data analysed in this work, this paper demonstrated that the appropriate choice of error estimator requires the user to make an informed, data-centered decision.