Author(s): Radovan Garabík,Jana Wachtarczyková / Language(s): Slovak
Issue: 2/2024
In this paper, we analyze the occurrences of the lexeme vojna (war) in major Slovak language corpora. Military topics is something that for the last 30 years did not enter Slovak linguistic landscape, a war being something remote either geographically or historically, thus not directly influencing daily life. We focus on both synchronic and diachronic analysis, looking at different number of occurrences of a web corpus and three major subcorpora of the Slovak National Corpus, viz. newspaper texts, fiction and professional texts, as well as the spoken corpus of Slovak. The occurrence of this and related military lexemes is relatively low in the web corpus, which might better reflect ordinary language. The occurrence is slightly higher in newspapers and fiction, while in the corpus of professional texts the lexeme appears significantly more than in others, largely due to scientific publications concerning history. The general spoken corpus also shows a similar occurrence to the web corpus, but in the recordings of the Nation’s Memory Institute vojna has a significantly higher occurrence. For diachronic analysis, we looked into some date delimited corpora. In the corpus of texts covering the years from 1995 to 1989, the occurrence is also significantly higher. However, in the earlier period from 1843 to 1954, it is comparable to contemporary ordinary language. Thanks to the detailed annotation of the Slovak National Corpus, we are able to analyze time dependency of the frequency of occurrences of the lexeme for the three main subcorpora for the years 1955–2024. Although we expected a decline in these occurrences over time, the assumption was only partially proven, primarily in professional texts (possibly due to the growth of scientific fields other than history). In fiction and news texts, the frequency of the lexeme has significantly increased in the last decade. Compared to historical texts from the second half of the 19th and the first half of the 20th century, neither modern text corpora nor the spoken corpus differ significantly. We also use word embeddings and their visualization to explore semantic grouping of words similar to vojna, and the results of an expected vector transfer from masculine vojak to feminine. Despite obvious and striking gender inequality in the modern military, the model does not show gender-biased results, as often exhibited by English language word embeddings models, but not the Slovak ones. Finally, we use a large language model with a few shot method to generate lexicographic definitions of the headword vojna, as an example of a possible path future lexicography can take.
More...