New Developments in Tagging Pre-modern Orthodox Slavic Texts
New Developments in Tagging Pre-modern Orthodox Slavic Texts
Author(s): Yves Scherrer, Susanne Mocken, Achim RabusSubject(s): Language studies, Language and Literature Studies, Theoretical Linguistics, Studies of Literature, Eastern Slavic Languages, Philology, Theory of Literature
Published by: Институт за литература - БАН
Keywords: Church Slavonic; natural language processing; part-of-speech tagging; Old Russian; neural networks
Summary/Abstract: Pre-modern Orthodox Slavic texts pose certain difficulties when it comes to part-of-speech and full morphological tagging. Orthographic and morphological heterogeneity makes it hard to apply resources that rely on normalized data, which is why previous attempts to train part-of-speech (POS) taggers for pre-modern Slavic often apply normalization routines. In the current paper, we further explore the normalization path; at the same time, we use the statistical CRF-tagger MarMoT and a newly developed neural network tagger that cope better with variation than previously applied rule-based or statistical taggers. Furthermore, we conduct transfer experiments to apply Modern Russian resources to pre-modern data. Our experiments show that while transfer experiments could not improve tagging performance significantly, state-of-the-art taggers reach between 90% and more than 95% tagging accuracy and thus approach the tagging accuracy of modern standard languages with rich morphology. Remarkably, these results are achieved without the need for normalization, which makes our research of practical relevance to the Paleoslavistic community.
Journal: Scripta & e-Scripta
- Issue Year: 2018
- Issue No: 18
- Page Range: 9-33
- Page Count: 25
- Language: English
- Content File-PDF