Vis enkel innførsel

dc.contributor.authorAres, Gastón
dc.contributor.authorAntúnez, Lucía
dc.contributor.authorOliveira, Denize
dc.contributor.authorAlcaire, Florencia
dc.contributor.authorGiménez, Ana
dc.contributor.authorBerget, Ingunn
dc.contributor.authorNæs, Tormod
dc.contributor.authorTomasco, Paula Alejandra Varela
dc.date.accessioned2018-02-07T07:37:05Z
dc.date.available2018-02-07T07:37:05Z
dc.date.created2015-09-09T14:11:39Z
dc.date.issued2015
dc.identifier.citationFood Quality and Preference. 2015, 46 48-57.
dc.identifier.issn0950-3293
dc.identifier.urihttp://hdl.handle.net/11250/2483091
dc.description.abstractBackground The need for precise and stable taxonomic classification is highly relevant in modern microbiology. Parallel to the explosion in the amount of sequence data accessible, there has also been a shift in focus for classification methods. Previously, alignment-based methods were the most applicable tools. Now, methods based on counting K-mers by sliding windows are the most interesting classification approach with respect to both speed and accuracy. Here, we present a systematic comparison on five different K-mer based classification methods for the 16S rRNA gene. The methods differ from each other both in data usage and modelling strategies. We have based our study on the commonly known and well-used naïve Bayes classifier from the RDP project, and four other methods were implemented and tested on two different data sets, on full-length sequences as well as fragments of typical read-length. Results The difference in classification error obtained by the methods seemed to be small, but they were stable and for both data sets tested. The Preprocessed nearest-neighbour (PLSNN) method performed best for full-length 16S rRNA sequences, significantly better than the naïve Bayes RDP method. On fragmented sequences the naïve Bayes Multinomial method performed best, significantly better than all other methods. For both data sets explored, and on both full-length and fragmented sequences, all the five methods reached an error-plateau. Conclusions We conclude that no K-mer based method is universally best for classifying both full-length sequences and fragments (reads). All methods approach an error plateau indicating improved training data is needed to improve classification from here. Classification errors occur most frequent for genera with few sequences present. For improving the taxonomy and testing new classification methods, the need for a better and more universal and robust training data set is crucial.
dc.language.isoeng
dc.titlePole selection in Polarized Sensory Positioning: Insights from the cognitive aspects behind the task
dc.typePeer reviewed
dc.typeJournal article
dc.description.versionsubmittedVersion
dc.source.pagenumber48-57
dc.source.volume46
dc.source.journalFood Quality and Preference
dc.identifier.doi10.1016/j.foodqual.2015.07.003
dc.identifier.cristin1262944
dc.relation.projectNorges forskningsråd: 225062
dc.relation.projectNorges forskningsråd: 225096
dc.relation.projectNofima AS: 201308
dc.relation.projectNofima AS: 201302
dc.relation.projectNofima AS: 10841
dc.relation.projectNorges forskningsråd: 233684
cristin.unitcode7543,3,2,0
cristin.unitcode7543,3,3,0
cristin.unitnameRåvare og prosess
cristin.unitnameSensorikk, forbruker og innovasjon
cristin.ispublishedtrue
cristin.fulltextpreprint
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel