Unbiased prediction errors for partial least squares regression models: Choosing a representative error estimator for process monitoring

Skou, Peter B; Tonolini, Margherita; Eskildsen, Carl Emil Aae; Berg, Frans van den; Rasmussen, Morten Arendt

dc.contributor.author	Skou, Peter B
dc.contributor.author	Tonolini, Margherita
dc.contributor.author	Eskildsen, Carl Emil Aae
dc.contributor.author	Berg, Frans van den
dc.contributor.author	Rasmussen, Morten Arendt
dc.date.accessioned	2023-08-08T12:18:24Z
dc.date.available	2023-08-08T12:18:24Z
dc.date.created	2023-07-10T14:58:49Z
dc.date.issued	2023
dc.identifier.citation	Journal of Near Infrared Spectroscopy. 2023, 31 (4), 186-195.
dc.identifier.issn	0967-0335
dc.identifier.uri	https://hdl.handle.net/11250/3083048
dc.description.abstract	Partial least squares (PLS) regression is widely used to predict chemical analytes from spectroscopic data, thus reducing the need for expensive and time-consuming wet chemical reference analysis in industrial process monitoring. However, predictions via PLS by definition carry sample-specific errors, and estimation of these errors is essential for correct interpretation of results. To increase trust in PLS regression-based predictions, reliable prediction error estimates must be reported. This can be achieved by determining realistic sample-specific prediction errors using an unbiased mean squared prediction error estimate. This work provides a guide for estimating sample-specific prediction errors, showing the importance of choosing an appropriate error estimator prior to deploying PLS models for industrial applications. We reviewed recent and established methods for estimating the sample-specific prediction error and test them through simulation studies. The methods were subsequently applied for estimating prediction errors in two real-life datasets from the food ingredients industry, where near-infrared spectroscopy was used to quantify i) urea in process water and ii) individual protein concentrations in ultrafiltration retentates from a protein fractionation process. Both the simulations and real data examples showed that the mean squared error of calibration is always a downward biased estimator. Although leave-one-out-cross-validation performed surprisingly well in the data analysed in this work, this paper demonstrated that the appropriate choice of error estimator requires the user to make an informed, data-centered decision.
dc.language.iso	eng
dc.title	Unbiased prediction errors for partial least squares regression models: Choosing a representative error estimator for process monitoring
dc.title.alternative	Unbiased prediction errors for partial least squares regression models: Choosing a representative error estimator for process monitoring
dc.type	Peer reviewed
dc.type	Journal article
dc.description.version	publishedVersion
dc.source.pagenumber	186-195
dc.source.volume	31
dc.source.journal	Journal of Near Infrared Spectroscopy
dc.source.issue	4
dc.identifier.doi	10.1177/09670335231173139
dc.identifier.cristin	2161729
dc.relation.project	EC/H2020/801199
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: skou-et-al-2023-unbiased-predi ...
Størrelse:: 1.176Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Artikler / Articles [1429]
Publikasjoner fra CRIStin [2493]

Vis enkel innførsel