Critical evaluation of assessor difference correction approaches in sensory analysis

In sensory data analysis, assessor-dependent scaling effects may hinder the analysis of product differences. Romano et al. (2008) compared several approaches to reduce scaling differences between assessors by their ability to maximise the product effect F-values in a mixed ANOVA analysis. Their study on a sensory dataset of 14 cheese samples assessed by twelve assessors on a continuous scale showed that some of these approaches apparently improved the F-value of the product effect. However, this direct comparison is only legitimate if these F-values originate from the same null distribution. To obtain the null distributions of the different correction methods, we employed a permutation approach on the same cheese dataset also used by Romano et al. (2008) and a random noise simulation approach. Based on the empirically obtained null distributions, we calculated the corrected product effect significance to directly compare the performance of the preprocessing methods. Our results show that the null distributions of some preprocessing methods do not correspond to the expected F-distribution. In particular for the ten Berge method, the null distribution is shifted towards higher F-values. Therefore, an observed increase of the product effect F-value, as compared to the F-value on raw data, does not necessarily lead to increased product effect significance. If p-values are calculated based on such inflated F-values, significance may thus be overestimated. In contrast, calculation of p-values directly from the empirical null distributions obtained by permutation provides a common ground to properly compare method performance. Moreover, we show that differences in reproducibility between assessors, as they exist in real-world sensory datasets, may lead to overestimation of product effect significance by the mixed assessor model (MAM).

Tidsskrift

Food Quality and Preference