Automatinis švietimo ir mokslo terminų nustatymas lingvistiniais metodais
Automatic identification of science and education terms using linguistic methods
Author(s): Erika RimkutėSubject(s): Theoretical Linguistics, Lexis, Computational linguistics, Baltic Languages
Published by: Lietuvių Kalbos Institutas
Keywords: text; educational and scientific terms; linguistic methods; title form; grammatical form;
Summary/Abstract: The paper deals with possibilities and problems of automatic Lithuanian term extraction. Specifically, linguistic methods are discussed in the domain of Education and Science. Other researchers have shown that it is almost impossible to have language-independent term extraction tools; this is especially true for tools which are based on linguistic rules. Therefore a linguistic term extraction tool should incorporate methods that would deal with a language’s grammatical system. This paper presents a tool developed at the Centre of Computational Linguistics of Vytautas Magnus University that employs linguistic rules for extracting domain-specific terminology. In order to extract domain-specific terms automatically, some preparatory work should be completed: compilation of domain-specific corpus (a corpus of four million words has been compiled for this research), morphological annotation of the corpus, formulation of appropriate linguistic rules, and creation of methodology for filtering out irrelevant word combinations. The paper presents the linguistic rules that have been used for the extraction of Education and Science terms and the results of the extraction procedure. The identified terms are contrasted with approved terms in the Term Bank of the Republic of Lithuania.
Journal: Terminologija
- Issue Year: 2012
- Issue No: 19
- Page Range: 54-69
- Page Count: 16
- Language: Lithuanian