Recognizing Handwritten Text in Slavic Manuscripts: a Neural-Network Approach Using Transkribus
Recognizing Handwritten Text in Slavic Manuscripts: a Neural-Network Approach Using Transkribus
Author(s): Achim RabusSubject(s): Language and Literature Studies, Foreign languages learning, Applied Linguistics, Computational linguistics, Philology, Translation Studies
Published by: Институт за литература - БАН
Keywords: Church Slavonic; Transkribus; automatic transcription; machine learning; neural networks; artificial intelligence
Summary/Abstract: The paper discusses the automatic text recognition capabilities of neural network models specifically trained to recognize different styles of Church Slavonic handwriting within the software platform Transkribus. Computed character error rates of the models are in the range of 3 to 5 percent; real-life performance shows that specifically trained models, by and large, recognize simple (non-superscript) characters correctly most of the time. The error rate is higher with superscript letters, abbreviations, and word separation. Combined models consisting of training data from different sources are capable of transcribing different styles of Slavic handwriting with low error rates. Automatic text recognition using Transkribus and the models presented in this paper can help improve the efficiency of the process of digitizing Church Slavonic manuscripts and thus boost the number of digitized sources available in the future.
Journal: Scripta & e-Scripta
- Issue Year: 2019
- Issue No: 19
- Page Range: 9-32
- Page Count: 24
- Language: English
- Content File-PDF