A preliminary study in zero anaphora coreference resolution for Polish
A preliminary study in zero anaphora coreference resolution for Polish
Author(s): Adam Jan Kaczmarek, Michał MarcińczukSubject(s): Computational linguistics, Western Slavic Languages
Published by: Instytut Slawistyki Polskiej Akademii Nauk
Keywords: coreference; zero subject; zero anaphora coreference in Polish;
Summary/Abstract: Zero anaphora is an element of the coreference resolution task that has not yet been directly addressed in Polish and, in most studies, it has been left as the most challenging aspect for further investigation. This article presents an initial study of this problem. The preparation of a machine learning approach, alongside engineering features based on linguistic study of the KPWr corpus, is discussed. This study utilizes existing tools for Polish coreference resolution as sources of partial coreferential clusters containing pronoun, noun and named entity mentions. They are also used as baseline zero coreference resolution systems for comparison with our system. The evaluation process is focused not only on clustering correctness, without taking into account types of mentions, using standard CoNLL-2012 measures, but also on the informativeness of the resulting relations. According to the annotation approach used for coreference to the KPWr corpus, only named entities are treated as mentions that are informative enough to constitute a link to real world objects. Consequently, we provide an evaluation of informativeness based on found links between zero anaphoras and named entities. For the same reason, we restrict coreference resolution in this study to mention clusters built around named entities.
Journal: Cognitive Studies | Études cognitives
- Issue Year: 2017
- Issue No: 17
- Page Range: 1-13
- Page Count: 13
- Language: English