Računalna obradba hrvatskih korpusa: povijest, stanje i perspektive

Marko Tadić

Računalna obradba hrvatskih korpusa: povijest, stanje i perspektive
Croatian Corpus Processing: History, State of the art and Perspectives

Author(s): Marko Tadić
Subject(s): Language and Literature Studies
Published by: Hrvatsko filološko društvo

Summary/Abstract: This article gives a survey of Croatian corpus processing. It lists the most important projects since the first Croatian computer corpus (Gundulić's Osman) up to the present time. The article focuses on the Croatian National Corpus which is the central project in the field of corpus linguistics in Croatia today. The Croatian National Corpus consists of two parts: 1) representative 30-million Corpus of Contemporary Croatian Language and 2) Croatian Electronic Text Archive. The 30-million Corpus covers the first phase of the Croatian National Corpus while the effort in the second phase will be concentrated on the widening of the contents of the Croatian Electronic Text Archive. The 30-million Corpus, which is now at the stage of advanced planning and software and pilot corpus (7,67 million of running words) testing, should to be finished in the year 2000.

Details
Contents

Journal: Suvremena lingvistika

Issue Year: 1997
Issue No: 43-44
Page Range: 387-394
Page Count: 8
Language: Croatian

Content File-PDF

Back to list

Računalna obradba hrvatskih korpusa: povijest, stanje i perspektive Croatian Corpus Processing: History, State of the art and Perspectives

Računalna obradba hrvatskih korpusa: povijest, stanje i perspektive
Croatian Corpus Processing: History, State of the art and Perspectives