The Corpus of Latgalian in the Context of Other Lesser Used Languages of Europe: Characterization, Usage and Potential Cover Image

Latgaliešu valodas korpuss citu Eiropas mazāk lietoto valodu kontekstā: korpusa raksturojums, lietojums un potenciālā iespējošana
The Corpus of Latgalian in the Context of Other Lesser Used Languages of Europe: Characterization, Usage and Potential

Author(s): Sanita Martena, Ilze Briška, Nicole Nau
Subject(s): Sociolinguistics, Sociology of Culture, Ethnic Minorities Studies, Sociology of Literature
Published by: Latvijas Universitātes Literatūras, folkloras un mākslas institūts
Keywords: Latgalian; corpus; corpus of Latgalian; MuLa; regional languages; lesser-used languages of Europe;

Summary/Abstract: The article reports on the current process of completing and further developing the corpus of contemporary written Latgalian (MuLa). It gives an overview over the sources and the principles of compiling the corpus and compares it with corpora of other lesser used languages of Europe. In addition, it analyses the profile of current and potential users, users’ experience with the corpus so far and their wishes for the future. The first version of the corpus MuLa, which has been publicly available since 2012, contains 1 million running words. It will be enlarged to 2 million words in the extended and corrected version prepared 2020–2022. Corpora of lesser used languages play an important role in the documentation and development of the language. They are also a valuable resource for the preparation of linguistic tools and teaching materials. While MuLa does not have either the size or the functionality of corpora of such well-cared-for European regional languages as Basque, Welsh, or Sami, the fact that a corpus exists and is being further developed puts Latgalian in a better position than some other regional languages in Europe. Due to a shortage of financial and human resources, corpora of lesser used languages are often compiled from sources automatically gathered from the Internet. MuLa, in turn, is still compiled manually, which allows higher control of register diversity and balance, as well as linguistic quality. The corpus is not tagged, but it is accessible to everyone. Data about the usage and users of MuLa since 2012 have been collected with an online questionnaire answered by 214 respondents. The study shows that the corpus has been used very little, mostly by researchers and a few other professionals. On the other hand, many respondents expressed a potential interest and ideas about potential uses of the corpus for learning about Latgalian as well as further developing their linguistic skills. In order for the corpus to be used more broadly, promotion and the spread of information within society is indispensable. Still more important is cooperation and a constant dialogue with teachers and university students.

  • Issue Year: 2022
  • Issue No: 47
  • Page Range: 208-224
  • Page Count: 17
  • Language: Latvian