Towards an event annotated corpus of Polish
Towards an event annotated corpus of Polish
Author(s): Michał Marcińczuk, Marcin Oleksy, Tomasz Bernaś, Jan Kocoń, Michał WolskiSubject(s): Language and Literature Studies, Theoretical Linguistics
Published by: Instytut Slawistyki Polskiej Akademii Nauk
Keywords: information extraction; event recognition; corpus annotation
Summary/Abstract: The paper presents a typology of events built on the basis of TimeML specification adapted to Polish language. Some changes were introduced to the definition of the event categories and a motivation for event categorization was formulated. The event annotation task is presented on two levels – ontology level (language independent) and text mentions (language dependant). The various types of event mentions in Polish text are discussed. A procedure for annotation of event mentions in Polish texts is presented and evaluated. In the evaluation a randomly selected set of documents from the Corpus of Wrocław University of Technology (called KPWr) was annotated by two linguists and the annotator agreement was calculated. The evaluation was done in two iterations. After the first evaluation we revised and improved the annotation procedure. The second evaluation showed a significant improvement of the agreement between annotators. The current work was focused on annotation and categorisation of event mentions in text. The future work will be focused on description of event with a set of attributes, arguments and relations.
Journal: Cognitive Studies | Études cognitives
- Issue Year: 2015
- Issue No: 15
- Page Range: 253-267
- Page Count: 15
- Language: English