Russian-Estonian code-switching corpus: elaboration of encoding principles Cover Image

Vene-eesti koodivahetuse korpus: kodeerimispõhimõtete väljatöötamine
Russian-Estonian code-switching corpus: elaboration of encoding principles

Author(s): Anastassija Zabrodskaja
Subject(s): Language and Literature Studies
Published by: Eesti Rakenduslingvistika Ühing (ERÜ)
Keywords: corpus linguistics; code-switching; Estonian; Russian

Summary/Abstract: The paper has several aims: 1) to introduce the goals of the LIPPS group and LIDES in the Estonian context, 2) to give an overview of the Russian-Estonian code-switching corpus with its sub-corpora (in preparation at Tallinn University), 3) to make an overview of the standards used to transcribe and encode multilingual data in the LIDES database, and 4) to formulate some principles of morphological encoding. Several sub-corpora are planned within the corpus: (a) bilingual TV talk shows; (b) data from bilingual Tallinn; (c) data from the predominantly Russian-speaking North East (Narva and Kohtla-Järve). The encoding of Russian-Estonian code-switching probably requires a special approach: Russian is written with Cyrillic letterswhereas Estonian uses the Roman script. The different alphabets may lead to a different treatment of Estonian elements in writing and in oral communication. Both Estonian and Russian have a developed inflectional morphology. Full integration of nouns means gender assignment and adding of inflectional morphology (case, number, or case and number). Empirical observations show that full morphological integration of an Estonian single noun into the Russian matrix is not always the case. The authors of the corpus are interested in instances where Russian inflectional morphology is absent, although the noun fits structurally into Russian declension classes. The focus is also on items whose belonging to either language is not clear. If a speaker speaks Estonian with a Russian accent, common internationalisms as well as Estonian proper names are ambiguous. Retention of two stresses in Estonian compound nouns in the Russian matrix is one of the relevant features to be encoded. As far as morphology in the Russian-Estonian LIDES Corpus is discussed in the second part of the article, here primary attention is given to: 1) morphological and phonic integration or lack thereof, and 2) compromise forms, new creations. It is necessary to introduce a special encoding system for in-between items and think of a way of encoding lack of integration in order to distinguish it from zero-endings. Numerous relevant examples are presented in the paper.

  • Issue Year: 2007
  • Issue No: 3
  • Page Range: 321-338
  • Page Count: 18
  • Language: Estonian
Toggle Accessibility Mode