Problémy automatické morfologické disambiguace češtiny
Problems of automatic morphological disambiguation of Czech
Author(s): Vladimír PetkevičSubject(s): Language and Literature Studies
Published by: AV ČR - Akademie věd České republiky - Ústav pro jazyk český
Keywords: automatic morphological disambiguation; corpora of the SYN series; improvement in tagging; rule-based and stochastic disambiguation
Summary/Abstract: The article focuses on some of the main problems in the current automatic morphological disambiguation of Czech. Following a description of the disambiguation methods used for disambiguating Czech texts and of their accuracy, the author discusses the main reasons why the correct morphological disambiguation of Czech texts contained in the corpora of the SYN series of the Czech National Corpus project is very difficult to achieve, and why, notwithstanding can improvement in disambiguation (e.g. the SYN2013PUB corpus is tagged in a better way than the SYN2000 corpus), there is still a lot of work to be accomplished. The author concentrates exclusively on the problems of rule-based disambiguation rather than on the stochastic one, trying to identify areas where disambiguation could be improved in the future. The necessity of a reliable disambiguation of Czech texts as a key prerequisite for their successful subsequent syntactic analysis is also stressed.
Journal: Naše řeč
- Issue Year: 2014
- Issue No: 4-5
- Page Range: 194-207
- Page Count: 14
- Language: Czech