Levels of Annotation in the Bulgarian National Corpus
Levels of Annotation in the Bulgarian National Corpus
Author(s): Sia Kolkovska, Svetla Koeva, Diana BlagoevaSubject(s): Language and Literature Studies
Published by: Wydział Polonistyki Uniwersytetu Warszawskiego
Keywords: korpus referencyjny; język bułgarski; anotacja; tokenizacja; tagowanie; tagset; sensy słów; reference corpus; Bulgarian language; annotation; tokenisation; tagging; tagset; word senses
Summary/Abstract: The paper presents levels of annotation adapted in the Bulgarian National Corpus. The first stage of annotation consisted in dividing the text into tokens (words), it was followed by morphosyntactic and semantic analysis. The morphosyntactic analysis is to a great extent unambiguous, since parts of the corpus have been annotated for the Bulgarian WordNet word senses. Moreover, the BulNC is annotated syntactically with a parser based on a specially constructed right context-sensitive grammar. All the levels of annotation are exploited in the BulNC search engine.
Journal: Prace Filologiczne
- Issue Year: 2012
- Issue No: 63
- Page Range: 147-154
- Page Count: 8
- Language: English