A2–C1-TASEME EKSAMITEKSTIDE KÄÄNDSÕNAKASUTUS
USE OF NOMINALS IN ESTONIAN A2–C1-LEVEL EXAM WRITINGS
Author(s): Kais Allkivi-MetsojaSubject(s): Morphology, Syntax, Lexis, Computational linguistics, Finno-Ugrian studies, Philology
Published by: Eesti Rakenduslingvistika Ühing (ERÜ)
Keywords: natural language processing; morphology; CEFR levels; written learner language; Estonian;
Summary/Abstract: In this study, natural language processing (NLP) is used to analyse nominal inflection in Estonian proficiency examination writings representing the CEFR levels A2–C1. The aim is to define the nominal features that distinguish learner language production at each proficiency level. For this purpose, the frequency and variation of inflectional forms are measured in two ways: a) for the nominal parts of speech (PoSs) in total, i.e., considering the use of nouns, pronouns, adjectives and numerals; b) for nouns, pronouns and adjectives individually (numerals were discarded due to low frequency). The analysed corpus contains 480 texts, 120 for each level. Nominal features based on the grammatical categories of number, case and degree of comparison are extracted from the morphologically tagged and manually corrected output of the Stanza NLP toolkit. Relevant features are selected according to the following criteria: they correlate with the proficiency level, their values change monotonically, and there are statistically significant differences between (some) adjacent levels. A2–C1-level texts are consistently distinguished by the number of cases used in the text as well as the ratio of singular and plural forms. The changes in the frequency of nominal inflectional forms mainly occur from level B1 to C1. The use of translative, nominative and genitive case are more strongly related to the text level, while partitive, inessive, elative and comitative case and comparative adjectives also differentiate some levels. Furthermore, the study indicates that it is beneficial to observe inflection-based features separately for each PoS when analysing L2 development. Firstly, the PoSspecific frequencies of some grammatical categories increase at different stages of proficiency. Secondly, changes may emerge for certain PoSs only. The identified criterial features could be used for automated assessment of Estonian L2 writings alongside lexical, syntactic and other linguistic features. The results can also help to specify the CEFR level descriptions for Estonian.
Journal: Eesti Rakenduslingvistika Ühingu aastaraamat
- Issue Year: 2022
- Issue No: 18
- Page Range: 33-53
- Page Count: 21
- Language: Estonian