Croatian Corpus Processing: History, State of the art and Perspectives Cover Image

Računalna obradba hrvatskih korpusa: povijest, stanje i perspektive
Croatian Corpus Processing: History, State of the art and Perspectives

Author(s): Marko Tadić
Subject(s): Language and Literature Studies
Published by: Hrvatsko filološko društvo

Summary/Abstract: This article gives a survey of Croatian corpus processing. It lists the most important projects since the first Croatian computer corpus (Gundulić's Osman) up to the present time. The article focuses on the Croatian National Corpus which is the central project in the field of corpus linguistics in Croatia today. The Croatian National Corpus consists of two parts: 1) representative 30-million Corpus of Contemporary Croatian Language and 2) Croatian Electronic Text Archive. The 30-million Corpus covers the first phase of the Croatian National Corpus while the effort in the second phase will be concentrated on the widening of the contents of the Croatian Electronic Text Archive. The 30-million Corpus, which is now at the stage of advanced planning and software and pilot corpus (7,67 million of running words) testing, should to be finished in the year 2000.

  • Issue Year: 1997
  • Issue No: 43-44
  • Page Range: 387-394
  • Page Count: 8
  • Language: Croatian