The Most Frequent Types of the Morphological Ambiguity of the Lithuanian Language and the Automatical Disambiguation of Them Cover Image

Dažniausios lietuvių kalbos morfologinio daugiareikšmiškumo rūšys ir jų automatinis vienareikšminimas
The Most Frequent Types of the Morphological Ambiguity of the Lithuanian Language and the Automatical Disambiguation of Them

Author(s): Erika Rimkutė, Aušra Grybinaitė
Subject(s): Language and Literature Studies
Published by: Kauno Technologijos Universitetas

Summary/Abstract: The article researches the morphological ambiguity, which was analysed in automatically tagged corpus of the Lithuanian language. The corpus with morphological tags has shown a large ambiguity of the language: almost 50 percent of word forms are ambiguous. The most frequent types of ambiguities are syncretism of singular and plural of the third person verbs, syncretism of non inflected parts-of-speech and case syncretism of nouns. This article presents linguistic, statistical rules and algorithms, that were created for morphological disambiguation. The constraints of disambiguation have been implemented in a programme that calculates the attributes and creates the learning set, necessary for creating the decision trees. For the meantime we have analysed about 40 percent of homoforms and have achieved 25 percent accurateness in disambiguation.

  • Issue Year: 2004
  • Issue No: 5
  • Page Range: 74-78
  • Page Count: 5
  • Language: Lithuanian