Diachronní složka Českého národního korpusu a hranice možností korpusového výzkumu vývoje češtiny
The diachronic part of the Czech National Corpus: Limitations of corpus research into the history of Czech
Author(s): Karel KučeraSubject(s): Language and Literature Studies
Published by: AV ČR - Akademie věd České republiky - Ústav pro jazyk český
Keywords: annotation; corpus size; corpus structure; diachronic corpus; history of Czech
Summary/Abstract: The paper reviews the present state of the diachronic part of the Czech National Corpus, with the focus on the two-million-word unannotated pivotal corpus Diakorp and its limitations in relation to corpus-based research into the history of Czech. A minimum 1,000,000-token growth, lemmatization and morphological tagging are cited as near-future enhancements to the corpus. A series of thoroughly structured monitoring diachronic corpora to be built from 2017 on is considered as a future basis for research into long-term trends in the history of Czech, thus complementing the quantity-oriented Diakorp.
Journal: Naše řeč
- Issue Year: 2014
- Issue No: 4-5
- Page Range: 208-215
- Page Count: 8
- Language: Czech