Heade näitelausete automaattuvastamine eesti keele õppesõnastike jaoks
Automatic detection of good dictionary examples in Estonian learner’s dictionaries
Author(s): Kristina KoppelSubject(s): Foreign languages learning, Theoretical Linguistics, Lexis
Published by: Eesti Rakenduslingvistika Ühing (ERÜ)
Keywords: corpus lexicography; corpus linguistics; learner’s lexicography; language learning; collocations; usage examples; GDEX; Estonian;
Summary/Abstract: This paper explains, firstly, how a tool called Good Dictionary Example (GDEX) (Kilgarriff et. al 2008) scores corpus sentences and helps the lexicographer automatically select the best examples for dictionaries. Secondly, the training datasets containing example sentences from the Estonian Collocations Dictionary (ECD) are introduced. Thirdly, the paper focuses on different parameters of good dictionary examples.Most of the paper is based on an analysis of the training datasets and an evaluation of the previous GDEX configurations. For evaluating the configurations, the graphical user interface GDEX Editor was used. Based on the results of statistical analysis and on the evaluation of different configurations, a new configuration 1.4 is introduced. There are 16 new parameters implemented in GDEX 1.4.The main parameters of GDEX 1.4 are as follows: the desired sentence is a full sentence; sentence length is 4–20 tokens; the sentence contains a verb; it does not contain low frequency words or words from the blacklist; the optimal length is 6–12 tokens; sentences containing more than 1 adverb, pronoun, proper name, numeral, conjunction, comma, more than 2 verbs and sentences containing certain pronouns are penalized.The output of GDEX 1.4 can be applied to the ECD project and to create a web interface SkELL for learners of Estonian.
Journal: Eesti Rakenduslingvistika Ühingu aastaraamat
- Issue Year: 2017
- Issue No: 13
- Page Range: 53-71
- Page Count: 19
- Language: Estonian