On a Corpus of Older Czech and Its Usage
On a Corpus of Older Czech and Its Usage
Author(s): František Martínek, Kateřina RysováSubject(s): Language and Literature Studies
Published by: Wydział Polonistyki Uniwersytetu Warszawskiego
Keywords: korpus diachroniczny; język czeski XVI wieku; transliteracja; szyk wyrazów; swobodne modyfikacje wyrazowe; diachronic corpus; humanistic Czech; transliteration; word order; free verbal modifications
Summary/Abstract: In the first part of the paper, the principles of the Corpus of Humanistic Czech are presented (1500-1620). It includes about 600,000 word forms and consists of approximately 50 texts and extracts of longer texts. Although it is not possible for the corpus to be representative, it is balanced – the texts have been selected with respect to text type, date of origin and other features. The texts are transcribed, and Annotation tags are embedded in the text, under similar principles which are employed in the Czech diachronic corpus DIAKORP. The only difference is caused by the fact that lemmatization of the corpus is not planned. Irregular word forms are thus marked and complemented by the default form, which enables the user of the corpus to search through it more effectively. The three-level approach to the lexical and morphological phenomena constitutes a theoretical background for the determination of the default forms of lexical units. They are classified as individual, collective or systematic phenomena. The paper exemplifies this distinction on the example of vowel quantity in roots and declension endings, as well as various forms of borrowings. The second part of the paper presents the results of a study of word order in the Humanistic Czech and demonstrates the applicability of the corpus to the study of syntax. Attention is focused on the placement of so-called free verbal modifications expressing manner in competition with so-called actants.
Journal: Prace Filologiczne
- Issue Year: 2012
- Issue No: 63
- Page Range: 219-230
- Page Count: 12
- Language: English