Automatic part-of-speech tagging of the Tartu Corpus of Estonian Learner English with CLAWS7: impact of learner errors
Automatic part-of-speech tagging of the Tartu Corpus of Estonian Learner English with CLAWS7: impact of learner errors
Author(s): Liina Tammekänd, Reeli Torn-LeesikSubject(s): Language studies, Foreign languages learning, Computational linguistics
Published by: Vilniaus Universiteto Leidykla
Keywords: learner English; automatic POS-tagging; learner errors; TCELE; CLAWS7;
Summary/Abstract: The present paper, which is a continuation of Tammekänd and Torn-Leesik’s (2022) study, aims to examine how learner errors affect the CLAWS7 tagger’s automated assignment of part-of-speech (POS) tags to a sample of 24,812 words of the Tartu Corpus of Estonian Learner English (TCELE). Learner errors causing tagging errors in the sample were identified, based on which a working error taxonomy was created. The POS-tagged and error-tagged samples were collated and compared to map correlations between learner and tagging errors. Error groups that correlated with significantly increased rates of tagging errors were identified. Possible reasons were suggested to account for the impact of learner errors on the tagger’s performance. The CLAWS7 tagger misanalysed only 2.8% of forms representing learners’ language errors but assigned wrong tags to every fifth spelling error (22%).
Journal: Taikomoji kalbotyra
- Issue Year: 2023
- Issue No: 20
- Page Range: 126-140
- Page Count: 15
- Language: English