The first corpus of childhood Czech speakers Cover Image

První korpus mluvčích češtiny v dětském věku
The first corpus of childhood Czech speakers

Author(s): Anna Chromá, Klára Matiasovitsová
Subject(s): Language and Literature Studies, Applied Linguistics, Morphology, Language acquisition, Psycholinguistics, Sociolinguistics, Developmental Psychology, Scientific Life
Published by: Univerzita Karlova v Praze - Filozofická fakulta, Vydavatelství
Keywords: Chroma corpus; Czech language; CHILDES; language acquisition; morphological annotation; linguistic analysis;

Summary/Abstract: The article discusses the Chroma corpus, a newly published dataset capturing the spoken interactions of monolingual Czech children aged 19 to 49 months. This corpus fills a chronological gap in Czech language acquisition research and is part of the international CHILDES database. The Chroma corpus includes audio recordings of spontaneous interactions between children and their caregivers, recorded longitudinally over 11 to 27 months. These recordings are transcribed using the CHAT transcription system, which is standard for CHILDES. The corpus contains 99,388 tokens in children's utterances and 238,211 tokens in adult utterances. The transcriptions are annotated morphologically using the MorphoDiTa tool, allowing for detailed linguistic analysis. The Chroma corpus is a significant resource for studying various linguistic phenomena, including morphological and syntactic innovations, and contributes to the broader understanding of first language acquisition.

  • Issue Year: 106/2024
  • Issue No: 1
  • Page Range: 107-109
  • Page Count: 3
  • Language: Czech
Toggle Accessibility Mode