From Electronic Publication of a Medieval Manuscript to Big Data, or What Artificial Intelligence Knows about the Beginning of Slavic Books Cover Image
  • Price 6.00 €

От электронной публикации средневековой рукописи до больших данных, или Что знает искусственный интеллект о начале славянской книжности
From Electronic Publication of a Medieval Manuscript to Big Data, or What Artificial Intelligence Knows about the Beginning of Slavic Books

Author(s): Victor A. Baranov
Subject(s): Language studies, Language and Literature Studies
Published by: Кирило-Методиевски научен център при Българска академия на науките
Keywords: medieval Slavic manuscripts; electronic edition; text corpus; corpus manager.

Summary/Abstract: The article describes the preparation of machine-readable linguistic resources based on medieval Slavic written monuments, as well as their use in systems for automated and automatic processing of large text data. The history of this area of applied Paleoslavistics is briefly shown on the example of several projects for the creation of electronic publications, collections and corpora of Slavic manuscripts. Particular attention is paid to the stages of development and the material of the Manuscript historical corpus (mansucripts.ru), which contains marked-up transliterations of Glagolitic and transcriptions of Cyrillic manuscripts of the 10th–15th centuries, as well as specialized tools for processing, demonstrating and analyzing non-standard graphic and spelling features and structure of texts. The labor-intensive and complex process of preparing copies of manuscripts and marking them up, unfortunately, is still the only way to convert a graphic image into a machine-readable form. It is noted that the tagged collections created on the basis of Slavic manuscripts make it possible to use the latter both for creating models for recognizing manuscripts in existing HTR systems and for developing new specialized tools for recognizing and analyzing Slavic manuscript heritage.

  • Issue Year: 2023
  • Issue No: 3
  • Page Range: 163-180
  • Page Count: 18
  • Language: Russian