Multiple Interpretation and Fragmented Texts within a Historical Corpus: The Case of Old East Slavic Vernacular Writing
Multiple Interpretation and Fragmented Texts within a Historical Corpus: The Case of Old East Slavic Vernacular Writing
Author(s): Dmitri SitchinavaSubject(s): Language studies, Language and Literature Studies, Applied Linguistics, Eastern Slavic Languages
Published by: Jazykovedný ústav Ľudovíta Štúra Slovenskej akadémie vied
Keywords: lacunae; epigraphy; fragmented text; historical corpus; birchbark letters annotation; lemmatization, Old East Slavic
Summary/Abstract: The paper presents the issue of fragmented and/or ambiguously interpreted texts within the corpora of Old East Slavic vernacular writing. One of these corpora, the corpus of the Old East Slavic birchbark letters, is already available, the other, comprising the texts of Old East Slavic inscriptions, is under preparation. Due to the fragmentary state of many birchbark and epigraphy texts, their lemmatization and grammatical tagging may be uncertain and multiple interpretations may coexist. Some lemmas survive only in fragments which are nevertheless relevant for the study of lexicon. The grammatical status of many fragments may be firmly established despite lacking lexical information. However the relevant data on these fragments is not available in the word indices and corpora that take into consideration only best-preserved word forms. In the paper, the representation and annotation of such word forms within the Old East Slavic vernacular corpora is presented, and relative frequencies of such phenomena within the birchbark letter corpus are shown, with some case studies showing the relevance of the annotation of fragmented forms. The existing approaches, namely for the classical epigraphy within the EpiDoc standard and in the Hittite syntactic treebanks, are also briefly presented and compared to the solution found within the Old East Slavic vernacular corpora.
Journal: Jazykovedný časopis
- Issue Year: 74/2023
- Issue No: 1
- Page Range: 266-274
- Page Count: 9
- Language: English