Pole selection in Polarized Sensory Positioning: Insights from the cognitive aspects behind the task

Ares, Gastón; Antúnez, Lucía; Oliveira, Denize; Alcaire, Florencia; Giménez, Ana; Berget, Ingunn; Næs, Tormod; Tomasco, Paula Alejandra Varela

dc.contributor.author	Ares, Gastón
dc.contributor.author	Antúnez, Lucía
dc.contributor.author	Oliveira, Denize
dc.contributor.author	Alcaire, Florencia
dc.contributor.author	Giménez, Ana
dc.contributor.author	Berget, Ingunn
dc.contributor.author	Næs, Tormod
dc.contributor.author	Tomasco, Paula Alejandra Varela
dc.date.accessioned	2018-02-07T07:37:05Z
dc.date.available	2018-02-07T07:37:05Z
dc.date.created	2015-09-09T14:11:39Z
dc.date.issued	2015
dc.identifier.citation	Food Quality and Preference. 2015, 46 48-57.
dc.identifier.issn	0950-3293
dc.identifier.uri	http://hdl.handle.net/11250/2483091
dc.description.abstract	Background The need for precise and stable taxonomic classification is highly relevant in modern microbiology. Parallel to the explosion in the amount of sequence data accessible, there has also been a shift in focus for classification methods. Previously, alignment-based methods were the most applicable tools. Now, methods based on counting K-mers by sliding windows are the most interesting classification approach with respect to both speed and accuracy. Here, we present a systematic comparison on five different K-mer based classification methods for the 16S rRNA gene. The methods differ from each other both in data usage and modelling strategies. We have based our study on the commonly known and well-used naïve Bayes classifier from the RDP project, and four other methods were implemented and tested on two different data sets, on full-length sequences as well as fragments of typical read-length. Results The difference in classification error obtained by the methods seemed to be small, but they were stable and for both data sets tested. The Preprocessed nearest-neighbour (PLSNN) method performed best for full-length 16S rRNA sequences, significantly better than the naïve Bayes RDP method. On fragmented sequences the naïve Bayes Multinomial method performed best, significantly better than all other methods. For both data sets explored, and on both full-length and fragmented sequences, all the five methods reached an error-plateau. Conclusions We conclude that no K-mer based method is universally best for classifying both full-length sequences and fragments (reads). All methods approach an error plateau indicating improved training data is needed to improve classification from here. Classification errors occur most frequent for genera with few sequences present. For improving the taxonomy and testing new classification methods, the need for a better and more universal and robust training data set is crucial.
dc.language.iso	eng
dc.title	Pole selection in Polarized Sensory Positioning: Insights from the cognitive aspects behind the task
dc.type	Peer reviewed
dc.type	Journal article
dc.description.version	submittedVersion
dc.source.pagenumber	48-57
dc.source.volume	46
dc.source.journal	Food Quality and Preference
dc.identifier.doi	10.1016/j.foodqual.2015.07.003
dc.identifier.cristin	1262944
dc.relation.project	Norges forskningsråd: 225062
dc.relation.project	Norges forskningsråd: 225096
dc.relation.project	Nofima AS: 201308
dc.relation.project	Nofima AS: 201302
dc.relation.project	Nofima AS: 10841
dc.relation.project	Norges forskningsråd: 233684
cristin.unitcode	7543,3,2,0
cristin.unitcode	7543,3,3,0
cristin.unitname	Råvare og prosess
cristin.unitname	Sensorikk, forbruker og innovasjon
cristin.ispublished	true
cristin.fulltext	preprint
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: FQAP-S-15-00224.pdf
Størrelse:: 772.7Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Artikler / Articles [1456]
Publikasjoner fra CRIStin [2533]

Vis enkel innførsel