The russian national corpus as an information source on particles Cover Image

Narodowy korpus języka rosyjskiego jako źrodło informacji o leksemach partykułowych
The russian national corpus as an information source on particles

Author(s): Ewa Konefał
Subject(s): Language and Literature Studies
Published by: Wydział Polonistyki Uniwersytetu Warszawskiego
Keywords: wyrazy pomocnicze; homonimia; ujednoznacznienie; znakowanie morfosyntaktyczne; ciągi polisegmentalne; function words; homonymy; disambiguation; part-of-speech tagging; poly-segmental sequences

Summary/Abstract: Function words – including particles – constitute a group of the most difficult lexemes of a given language in terms of their description. This is caused by the great homonymy of this group and, simultaneously, by a lack of specific criteria for their categorisation. Problems connected with a description of particles are well illustrated by the way they are part-of-speech tagged in the Russian National Corpus and by unsuccessful attempts at word-sense disambiguation. However, the wealth of source materials that is available in the Corpus together with appropriate tools for their analysis both create great observation opportunities. These opportunities refer to both characterising simple particles (including verification of much lexicographic data concerning these particles), which is based on metadata and frequency data, as well as to distinguishing poly-segmental sequences.

  • Issue Year: 2015
  • Issue No: 67
  • Page Range: 197-214
  • Page Count: 18
  • Language: Polish