Word Sense Disambiguation in the National Corpus Of Polish
Word Sense Disambiguation in the National Corpus Of Polish
Author(s): Rafał Młodzki, Mateusz Kopeć, Adam PrzepiórkowskiSubject(s): Language and Literature Studies
Published by: Wydział Polonistyki Uniwersytetu Warszawskiego
Keywords: język polski; korpus; ujednoznacznienie; sensy słów; homonimia; algorytm corpus; Polish; disambiguation; word sense; homonymy; algorithm
Summary/Abstract: The paper describes the word sense disambiguation procedure in the National Corpus of Polish. A selection of 106 most frequent ambiguous lexemes, each with an average of almost 3 different meanings, has been subject to the procedure. For the sake of the experiment, the number of senses described in Polish dictionaries was reduced by merging detailed meanings into more general semantic classes. The Word Sense Disambiguation Development Environment tool was used in order to distinguish and identify word senses. Context features were extracted by means of 4 types of feature generators, determining thematic features, two types of structural features and keyword features.
Journal: Prace Filologiczne
- Issue Year: 2012
- Issue No: 63
- Page Range: 155-166
- Page Count: 12
- Language: English