ELECTRONIC CORPUS OF 17TH- AND 18TH-CENTURY POLISH TEXTS – THEORETICAL AND WORKSHOP PROBLEMS Cover Image
  • Price 4.90 €

ELEKTRONICZNY KORPUS TEKSTÓW POLSKICH Z XVII I XVIII W. – PROBLEMY TEORETYCZNE I WARSZTATOWE
ELECTRONIC CORPUS OF 17TH- AND 18TH-CENTURY POLISH TEXTS – THEORETICAL AND WORKSHOP PROBLEMS

Author(s): Włodzimierz Gruszczyński, Dorota Adamiec, Renata Bronikowska, Aleksandra Wieczorek
Contributor(s): Monika Czarnecka (Translator)
Subject(s): Theoretical Linguistics, Historical Linguistics, Western Slavic Languages, 17th Century, 18th Century, Philology
Published by: Dom Wydawniczy ELIPSA
Keywords: electronic text corpus; historical corpus; 17th-18th-century Polish; natural language processing;

Summary/Abstract: This paper presents the Electronic Corpus of 17th- and 18th-century Polish Texts (KorBa) – a large (13.5-million), annotated historical corpus available online. Its creation was modelled on the assumptions of the National Corpus of Polish (NKJP), yet the specific nature of the historical material enforced certain modifications of the solutions applied in NKJP, e.g. two forms of text representation (transliteration and transcription) were introduced, the principle of designating foreign-language fragments was adopted, and the tagset was adapted to the description of the grammatical structure of the Middle Polish language. The texts collected in KorBa are diversified in chronological, geographical, stylistic, and thematic terms although, due to e.g. limited access to the material, the postulate of representativeness and sustainability of the corpus was not fully implemented. The work on the corpus was to a large extent automated as a result of using natural language processing tools.

  • Issue Year: 2020
  • Issue No: 08
  • Page Range: 32-51
  • Page Count: 20
  • Language: Polish