Wege zur verbesserten automatischen Annotation des mittelbulgarischen Kirchenslawischen
Ways to improve automatic annotation of Middle Bulgarian Church Slavonic
Author(s): Fabio MaionSubject(s): Language studies, Language and Literature Studies, Applied Linguistics, Computational linguistics, Philology, Translation Studies
Published by: Институт за литература - БАН
Keywords: Middle Bulgarian Church Slavonic; natural language processing; part-of-speech tagging; Middle Greek; Dioptra
Summary/Abstract: The last decade has brought an upswing in research on natural language processing. However, it is well known that historical language stages are largely underrepresented. Middle Bulgarian Church Slavonic, a language variety with a significant literary productivity, is a prime example. In the current paper, it is shown how annotated texts of related language varieties can be used to annotate texts written in Middle Bulgarian Church Slavonic, such as the 14th-century translation of the Dioptra. In particular, I present a way of adapting the available training data and of reducing the differences between training and test data, thereby improving the result of the automatic morphological annotation. Moreover, it is demonstrated that a comparison with the original work, written in Byzantine Greek, can further increase the results of the annotation by carefully disambiguating homonymous word forms. The presented results can benefit research on Middle Bulgarian Church Slavonic as it shows how texts in this variety can be annotated without authentic training data. The proposed method may be of use not only for Slavonic Studies, however. The method of using training data from genetically related language varieties in combination with translations may be used to annotate other underrepresented language varieties as well.
Journal: Scripta & e-Scripta
- Issue Year: 2022
- Issue No: 22
- Page Range: 365-390
- Page Count: 26
- Language: German
- Content File-PDF