Detecting regularities and errors based on adverb-containing POS-grams Cover Image

Keelekasutusreeglite tuletamine ja veatuvastus määrsõna sisaldavate sõnaliigijärjendite näitel
Detecting regularities and errors based on adverb-containing POS-grams

Author(s): Kais Allkivi-Metsoja, Pille Eslon, Jaagup Kippar
Subject(s): Language and Literature Studies, Morphology, Syntax
Published by: Teaduste Akadeemia Kirjastus
Keywords: grammatical error detection; natural language processing; morphosyntax; usage-based approach; n-grams;

Summary/Abstract: The article introduces a software tool that allows us to detect regularities and errors in Estonian language texts, based on the usage contexts of POS-grams. It converts each sentence to a POS string and extracts trigrams, i.e., three-word sequences. Then, it calculates the probabilities of various preceding and subsequent contexts, which can either be a certain POS, or the beginning or the end of a sentence. Error detection relies on the comparison with a statistical language model. In this paper, we focus on the contexts of adverb-containing POS-grams, which are prone to word order errors. Our aim is two-fold: 1) using the Estonian Reference Corpus, we build a language model and analyse it to describe the POS-grams that are preferably used in the context of sentence onset or ending; 2) we evaluate the error detection performance of the tool on the EstGEC-L2 test corpus, consisting of error-annotated sentences from second language learner writings. The cut-off value for defining rare contexts is set to 5%. We find that the POS-grams commonly used in sentence onsets are lexicogrammatically more stereotypical, while those preferred at the end of a sentence show more variation. POS-gram analysis also proves to be useful in pointing out word order errors, unnecessary and missing words, occasionally word choice and spelling errors (if POS detection is affected). Most frequently, the detected errors violate the V2 word order at the beginning of a sentence/clause. Other word order errors occur mainly at the sentence/clause ending.

  • Issue Year: 2024
  • Issue No: 69
  • Page Range: 9-34
  • Page Count: 26
  • Language: Estonian
Toggle Accessibility Mode