The grammar model of Latvian Treebank Cover Image

Latviešu valodas sintaktiski marķētā korpusa gramatikas modelis
The grammar model of Latvian Treebank

Author(s): Laura Rituma, Baiba Saulite, Gunta Nešpore-Bērzkalne
Subject(s): Morphology, Syntax, Computational linguistics, Baltic Languages
Published by: Latvijas Universitātes Akadēmiskais apgāds
Keywords: Latvian syntax; treebank; grammar model; dependency syntax; phrase structure grammar;

Summary/Abstract: This paper describes the development of Latvian Treebank and its grammar model. This corpus is the first syntactically annotated corpus for Latvian, and currently contains approximately 13000 annotated sentences. A hybrid dependency-constituency model was developed in order to describe Latvian syntactic constructions as accurately as possible, augmenting the commonly used dependency grammars with phrase constructions for certain syntactic elements – analytical word forms and relations other than subordination. The grammar model is based on idea of a syntactic nucleus which is a functional syntactic unit consisting of content-words or syntactically inseparable units that are treated. There are three kinds of phrase constructions in the Latvian Treebank grammar model: x-words, coordination and punctuation mark constructions. X-words are used for analytical forms, compound predicates, prepositional phrases etc. Coordination constructions are used for coordinated parts of sentences and coordinated clauses. Punctuation mark constructions are used to annotate different types of constructions that require the punctuation in the sentence. The chosen annotation approach and data transformation systems ensure that the corpus is accessible to end users both in the hybrid dependency-constituency model suitable for research of syntactic phenomena in Latvian linguistic tradition, and in the Universal Dependencies multilingual model that is better suited for certain computational linguistics systems. This work has received financial support from European Regional Development Fund under the grant agreement No. 1.1.1.1/16/A/219 (Full Stack of Language Resources for Natural Language Understanding and Generation in Latvian) in synergy with the grant agreement No. 1.1.1.2/VIAA/1/16/188 (From Abstract Meaning Representation to Natural Language Sentence and Coherent Text Generation).

  • Issue Year: 2019
  • Issue No: 10
  • Page Range: 200-216
  • Page Count: 17
  • Language: Latvian
Toggle Accessibility Mode