Statistilised meetodid murdekorpuse ühendverbide tuvastamisel
Statistical methods for phrasal verb detection in Estonian dialects
Author(s): Kristel UiboaedSubject(s): Language and Literature Studies
Published by: Eesti Rakenduslingvistika Ühing (ERÜ)
Keywords: computational linguistics; corpus linguistics; dialectology; methods and tools; statistics; Estonian
Summary/Abstract: The aim of this study was to assess different statistical methods of automatic collocations extraction from the corpus. To extract the collocations, association measures (AM) were applied and the association scores (AS) for the collocation candidates found in the corpus were calculated. An AS indicates the collocational strength between two words. An advantage of the AMs is the fact that in addition to the co-occurrence frequency, the marginal frequencies of collocating words are also taken into account. To calculate the AS, the following data is needed: co-occurrence frequency, marginal frequencies of collocating words, expected frequency and the sample size.
Journal: Eesti Rakenduslingvistika Ühingu aastaraamat
- Issue Year: 2010
- Issue No: 6
- Page Range: 307-326
- Page Count: 19
- Language: Estonian