Probl&#233;my automatick&#233; morfologick&#233; disambiguace češtiny

Vladim&#237;r Petkevič

Problémy automatické morfologické disambiguace češtiny
Problems of automatic morphological disambiguation of Czech

Author(s): Vladimír Petkevič
Subject(s): Language and Literature Studies
Published by: AV ČR - Akademie věd České republiky - Ústav pro jazyk český
Keywords: automatic morphological disambiguation; corpora of the SYN series; improvement in tagging; rule-based and stochastic disambiguation

Summary/Abstract: The article focuses on some of the main problems in the current automatic morphological disambiguation of Czech. Following a description of the disambiguation methods used for disambiguating Czech texts and of their accuracy, the author discusses the main reasons why the correct morphological disambiguation of Czech texts contained in the corpora of the SYN series of the Czech National Corpus project is very difficult to achieve, and why, notwithstanding can improvement in disambiguation (e.g. the SYN2013PUB corpus is tagged in a better way than the SYN2000 corpus), there is still a lot of work to be accomplished. The author concentrates exclusively on the problems of rule-based disambiguation rather than on the stochastic one, trying to identify areas where disambiguation could be improved in the future. The necessity of a reliable disambiguation of Czech texts as a key prerequisite for their successful subsequent syntactic analysis is also stressed.

Details
Contents

Journal: Naše řeč

Issue Year: 2014
Issue No: 4-5
Page Range: 194-207
Page Count: 14
Language: Czech

Content File-PDF

Back to list

Problémy automatické morfologické disambiguace češtiny Problems of automatic morphological disambiguation of Czech

Problémy automatické morfologické disambiguace češtiny
Problems of automatic morphological disambiguation of Czech