Performance of Generic HTR Models on Historical Cyrillic and Glagolitic: Comparison of Engines
Performance of Generic HTR Models on Historical Cyrillic and Glagolitic: Comparison of Engines
Author(s): Achim Rabus, Walker ThompsonSubject(s): Language studies, Language and Literature Studies, Theoretical Linguistics, Applied Linguistics, Historical Linguistics, Computational linguistics, South Slavic Languages, Philology, Translation Studies
Published by: Институт за литература - БАН
Keywords: handwritten text recognition; Transkribus; machine learning; Cyrillic palaeography; Glagolitic printings
Summary/Abstract: The present study offers a comparative evaluation of the performance of different AI-based digital tools for handwritten text recognition (HTR) on historical manuscripts and prints. The focus is on generic models capable of transcribing a range of texts in a similar script. The training dataset for these comprises Old Cyrillic ustav and poluustav manuscripts, on the one hand, and early Glagolitic printed books, on the other. We give an overview of the performance statistics for the HTR platforms Transkribus and eScriptorium as well as for the command-line tool Calamari. In each case, we additionally offer a close, qualitative analysis of select examples in order to convey a sense of the models’ real-world performance. In this way, our study supplies comparative data on the respective capabilities of these technologies that ought to be of interest to scholars working with them in digital humanities projects.
Journal: Scripta & e-Scripta
- Issue Year: 2023
- Issue No: 23
- Page Range: 11-34
- Page Count: 24
- Language: English
- Content File-PDF