Internetikeele automaatne süntaktiline analüüs kitsenduste grammatikaga
Syntactic analysis of Estonian netspeak using constraint grammar
Author(s): Dage SärgSubject(s): Applied Linguistics, Computational linguistics
Published by: Eesti Rakenduslingvistika Ühing (ERÜ)
Keywords: computational linguistics; natural language processing; syntax; dependency parsing; language variation; Estonian
Summary/Abstract: The paper provides an overview of an attempt to adapt the Estonian Constraint Grammar rule set for netspeak. The rule set has been developed by Kaili Müürisep and Tiina Puolakainen for shallow and dependency parsing of standard written Estonian, and it has previously been adapted for shallow parsing of spoken Estonian by Kaili Müürisep and Heli Uibo. First, in order to adapt the rules, a chatroom corpus was parsed with the existing rule set. The corpus was manually revised and based on the errors that were found, changes were made to the rule set. The changes regarded detection of clause boundaries and particle verbs, as well as assignment of syntactic tags and dependency relations. Extensive use of discourse particles and direct addresses, short sentence length, and small percentage of attributes among the syntactic functions used in text appeared to be the most distinctive features of netspeak, as well as the large amount of elliptical sentences from which, in addition to other syntactic functions, a predicate can be left out. As a result of adapting the rule set, the results of both shallow and dependency parsing improved. The most error-prone syntactic functions were subjects, predicatives, and adverbials. In dependency parsing, the largest number of errors was made in determining the governors of adverbials.
Journal: Eesti Rakenduslingvistika Ühingu aastaraamat
- Issue Year: 2016
- Issue No: 12
- Page Range: 253-267
- Page Count: 15
- Language: Estonian