Eesti Wordnet'i hetkeseisust
Estonian Wordnet Today
Author(s): Sirli Parm, Heili Orav, Kadri Kerner Subject(s): Language and Literature Studies
Published by: SA Kultuurileht
Keywords: Estonian Wordnet; wordnet; computational lexicography; human language technology
Summary/Abstract: This article describes the creation of the estonian Wordnet and discusses the main problems that have been dealt with during recent years. at present the estonian Wordnet consists of more than 46,000 lexical units, which form more than 30,000 concepts (synsets). The estonian Wordnet, at the present stage, includes nouns, verbs, adjectives and adverbs. The concepts in the estonian Wordnet are connected by 43 different types of semantic relations. While determining these relations there are two main criteria to consider. First, there is a need for estonian language specific relations and, second, there is a need (considering language technological applications) for a rich network of semantic relations. This paper presents a list of estonian language specific relations which should be present in the estonian Wordnet, while the main focus is on adjectives and adverbs. So far, our approach to enlargement has mainly been manual and domain-specific, i.e we have gradually added semantic fields such as architecture, transportation, personality traits. There was also an attempt to enlarge estonian Wordnet semi-automatically by transferring around 3000 new noun synsets from the estonian Synonym and antonym dictionary. it turned out, however, that there were too many synsets which had to be corrected, revised, joined together or deleted, and the work of revising took up too much time. also, this paper describes the problems of adding multiword units and compound words into the estonian Wordnet, since the estonian language has an infinite capacity for compounding. it was found that it is important to add the most frequent ones. another ongoing work consists in the inclusion of domain labels from Wordnet domains. Besides the enlargement of the estonian Wordnet we have started revising the existing data. One of the problems is the revision of hierarchies and so far one study has been carried out – the checking of the taxonomy of ’human being’ in the estWn. Our future plans include automatic addition of synsets generated by derivation, since estonian is rich in derivatives. We will start from suffixes that are highly regular, and move on to cases requiring morphological analysis and synthesis.
Journal: Keel ja Kirjandus
- Issue Year: LIV/2011
- Issue No: 02
- Page Range: 96-106
- Page Count: 11
- Language: Estonian