The Slovak National Corpus and Its Corpus Linguistic Resources
The Slovak National Corpus and Its Corpus Linguistic Resources
Author(s): Radovan Garabík, Mária ŠimkováSubject(s): Language and Literature Studies
Published by: Wydział Polonistyki Uniwersytetu Warszawskiego
Keywords: Narodowy Korpus Języka Słowackiego; anotacja; narzędzia korpusowe; kolokacje; korpus języka mówionego; Slovak National Corpus; annotation; corpus tools; collocations; spoken language corpus
Summary/Abstract: The paper deals with selected projects based on the Slovak National Corpus (SNK), a large, representative corpus of modern written Slovak (since the 1953 orthography reform). Currently the corpus consists of over 770 million words and is gradually being expanded. The SNK embraces a number of subcorpora representing various types of specialized discourse: fiction, professional texts, informational texts as well as a balanced subcorpus. The texts included in the SNK are automatically morphologically annotated, and it is possible to extract detailed meta-information. The corpus is accessible for all non-commercial users and is equipped with a search engine enabling queries concerning morphological and syntactic features, collocations, as well as statistical analysis. Other corpora connected with the SNK project include: the 1.2-million manually morphologically annotated corpus, the Corpus of Spoken Slovak (containing about 1.65 million tokens) and a treebank of 50,000 sentences. The Slovak Terminology Database and the Slovak WordNet are also being developed within the project.
Journal: Prace Filologiczne
- Issue Year: 2012
- Issue No: 63
- Page Range: 109-120
- Page Count: 12
- Language: English