Values of Verbal Morphological Categories in the SYN2020 Corpus — the Verbtag Attribute Cover Image

Hodnoty slovesných morfologických kategorií v korpusu SYN2020 — atribut verbtag
Values of Verbal Morphological Categories in the SYN2020 Corpus — the Verbtag Attribute

Author(s): Tomáš Jelínek, Vladimír Petkevič, Hana Skoumalova
Subject(s): Philology
Published by: Univerzita Karlova v Praze - Filozofická fakulta, Vydavatelství
Keywords: verbtag attribute; morphology of Czech verbs; morphological categories and values; automatic annotation; SYN2020 corpus

Summary/Abstract: The paper describes the verbtag attribute, which allows a user to search, in the SYN2020 corpus (and also subsequent corpora, SYNv9 and SYNv10) of contemporary Czech, for all values of morphological categories of verbs, i.e., not only those contained in the tag attribute, but also those related mainly to multi-word participial verb predicates, which are prevalent in Czech. The verbtag attribute contains information indicating whether the verb (co-)forming the verbal meaning is either auxiliary or autosemantic, as well as information about the verb mode, diathesis, person, number and tense. The annotation applies both to verb predicates expressed in a single word (e.g., the 1st person indicative present tense: Čtu rád detektivní příběhy. ‘I like to read detective stories.’) and (especially) to verb predicates expressed in multiple words (e.g., the present conditional of the 1st person singular: Pak bych mu s chutí nabídla výhodnou smlouvu. ‘Then I would gladly offer him a good deal.’). The authors introduce the motivation and the concept of the verbtag annotation, describe relevant morphological categories and their values in detail, and show, via examples, how various multiword structures expressing verbal meaning are annotated in the verbtag attribute. They also offer users a guide to the whole issue of verbal morphosyntax manifested in the verbtag attribute and possibilities for efficient search for and retrieval of morphological/morphosyntactic data. The paper shows which multiple verb complexes are simple in terms of annotation, but also identifies more complex cases (e.g., coordination of participles) which are not easy to automatically annotate, and/or whose annotation is unclear in terms of an adequate theoretical approach. The authors also present the method used for annotating multiword verbal complexes and its current success rate.

  • Issue Year: 104/2022
  • Issue No: 1
  • Page Range: 89-109
  • Page Count: 21
  • Language: Czech
Toggle Accessibility Mode