Statistilised meetodid ühendverbide tuvastamisel tekstikorpusest.
Statistical methods for estonian particle verb extraction from text corpora.
Author(s): Aedmaa EleriSubject(s): Theoretical Linguistics, Computational linguistics, Baltic Languages
Published by: Eesti Rakenduslingvistika Ühing (ERÜ)
Keywords: computational linguistics; corpus linguistics; multi-word expressions; particle verbs; statistics; Estonian;
Summary/Abstract: The present article compares lexical association measures (AMs) for automatic extraction of Estonian particle verbs from the newspaper part of the Estonian Reference Corpus. The main purpose of this study is to ascertain the best symmetrical AM for Estonian particle verb extraction. The central focus lies on the impact of the corpus size on the performance of the compared symmetrical association measures. In addition, asymmetrical AMs have been included in the study to observe their suitability for Estonian particle verb extraction. Five symmetrical association measures have been used, namely the t-test, log-likelihood, ǒ2, mutual information, and minimum sensitivity, as well as two asymmetrical association measures, namely FRQGLWLRQDO SUREDELOLW\ DQG Ʃ3 7KH DVVRFLDWLRQ PHDVXUHV ZHUH FRPSDUHG DJDLQVW the co-occurrence frequency of verb and verbal particle. The analysis of the comparison reveals that the t-test achieved the best precision values and the corpus size has an impact on the performance of the AMs. As the corpus size increased, the performances of the t-test, log-likelihood, ǒ2 and minimum sensitivity increased and the precision of mutual information decreased. The performance of (simple) frequency did not change significantly as the size of the corpus increased. The comparison of symmetrical and asymmetrical AMs revealed that asymmetrical association measures are suitable for the task of Estonian particle verb extraction and provide slightly different and more detailed information about the extracted particle verbs. The results presented in this article confirm that further study of asymmetrical AMs is necessary and more experiments are needed to broaden our knowledge of the performance of asymmetrical AMs.
Journal: Eesti Rakenduslingvistika Ühingu aastaraamat
- Issue Year: 2015
- Issue No: 11
- Page Range: 37-54
- Page Count: 18
- Language: Estonian