Parallel Corpora within the Russian National Corpus
Parallel Corpora within the Russian National Corpus
Author(s): Dmitri SitchinavaSubject(s): Language and Literature Studies
Published by: Wydział Polonistyki Uniwersytetu Warszawskiego
Keywords: korpusy równoległe; wyrównywanie; tagowanie morfosyntaktyczne; tagset; parallel corpora; alignment; morpho-syntactic tagging; tagset
Summary/Abstract: The paper presents parallel corpora within the Russian National Corpus. Attention has been paid to the text alignment principles, and a number of available tools serving this purpose (e.g. LeoBilingua or HunAlign) have been characterized and evaluated. Morphological tagging of texts in languages whose grammatical categories differ is described. Moreover, the author enumerates the existing parallel corpora within the RNC and specifies the plans for expanding the project (which is far from being accomplished). Finally, corpora-based research are exemplified.
Journal: Prace Filologiczne
- Issue Year: 2012
- Issue No: 63
- Page Range: 271-278
- Page Count: 8
- Language: English