Dažniausios lietuvių kalbos morfologinio daugiareikšmiškumo rūšys ir jų automatinis vienareikšminimas
The Most Frequent Types of the Morphological Ambiguity of the Lithuanian Language and the Automatical Disambiguation of Them
Author(s): Erika Rimkutė, Aušra GrybinaitėSubject(s): Language and Literature Studies
Published by: Kauno Technologijos Universitetas
Summary/Abstract: The article researches the morphological ambiguity, which was analysed in automatically tagged corpus of the Lithuanian language. The corpus with morphological tags has shown a large ambiguity of the language: almost 50 percent of word forms are ambiguous. The most frequent types of ambiguities are syncretism of singular and plural of the third person verbs, syncretism of non inflected parts-of-speech and case syncretism of nouns. This article presents linguistic, statistical rules and algorithms, that were created for morphological disambiguation. The constraints of disambiguation have been implemented in a programme that calculates the attributes and creates the learning set, necessary for creating the decision trees. For the meantime we have analysed about 40 percent of homoforms and have achieved 25 percent accurateness in disambiguation.
Journal: Kalbų Studijos
- Issue Year: 2004
- Issue No: 5
- Page Range: 74-78
- Page Count: 5
- Language: Lithuanian