Stratified historical corpus of Estonian 1800–1940 Cover Image

Stratified historical corpus of Estonian 1800–1940
Stratified historical corpus of Estonian 1800–1940

Author(s): Peeter Tinits
Subject(s): Historical Linguistics, Sociolinguistics, Baltic Languages
Published by: Eesti Rakenduslingvistika Ühing (ERÜ)
Keywords: Corpus; language resource; metadata; historical sociolinguistics; sociolinguistic variable; written Estonian;

Summary/Abstract: The article introduces a stratified historical corpus of Estonian 1800–1940. A stratified corpus will allow for sociolinguistic comparisons of language use between past authors, considering their background and biographical details (e.g. native dialect area, age cohort, attained education) or the publication details (e.g. genre of publication or publisher). The corpus assembles texts from a number of different public archives and combines it with metadata on their publication details and the author’s background. The corpus at the moment of publication consists of 4,412 works from 1,188 author names, constituting 11% of the works registered in the Estonian National Bibliography from 1800–1940. The author names are associated with biographical information where possible. Three use cases on studying orthographic variation are introduced as examples where the corpus can help study past language communities. The corpus is published online to allow updates as data is improved and more texts are digitized.

  • Issue Year: 2023
  • Issue No: 19
  • Page Range: 175-194
  • Page Count: 20
  • Language: English