Algorithms for Detecting Clitics in the Lithuanian Text Cover Image

Klitikų paieškos lietuviškame tekste algoritmai
Algorithms for Detecting Clitics in the Lithuanian Text

Author(s): Pijus Kasparaitis, Tomas Anbinderis
Subject(s): Language and Literature Studies
Published by: Kauno Technologijos Universitetas
Keywords: clitics; automatic stressing; Lithuanian language

Summary/Abstract: The task of automatic stressing of the Lithuanian text is analyzed in the paper. Stressed text can be used when teaching the Lithuanian language, in the text-to-speech synthesis systems, etc. In spoken language, some words are left unstressed (called clitics) and join the stressed ones. In linguistic papers it is possible to find only common tendencies of clisis however algorithms of clitics‘ search in Lithuanian language text necessary in human language technologies is a completely not researched field. Factors influencing clisis are reviewed and methods for detecting clitics are offered in the present paper. The methods are based on 1) recognizing combinational forms, 2) the statistical frequency of word being stressed/ unstressed, 3) some grammatical rules, 4) stressing of adjacent words. The second method is very simple and quite reliable but better results were achieved when using the third and the fourth methods for some classes of words. Words’ classes are defined as well as which method suits best. It is explained, how to join all the methods into one algorithm. We attempt to minimize the sum of mistakes of 1st type and 2nd type when creating this algorithm. By applying this algorithm to the testing data 4.1% mistakes are received among all the words, and the ratio of mistakes and unstressed words is 18.8%.

  • Issue Year: 2007
  • Issue No: 10
  • Page Range: 30-37
  • Page Count: 7
  • Language: Lithuanian
Toggle Accessibility Mode