Author(s): Piotr Wierzchoń / Language(s): Polish
Issue: 124/2008
The article presents a multi-word unit retrieval procedure set against the background of other computer science-based methods for the retrieval of multi-word idiomatic units (methods: T-score, Mutual Information, n-gramming etc. and, on the other hand, observation of structural idiomaticity determinants). In Polish language and contrastive Polish-Russian linguistics major inspirations concerning the description of idiomatic combinations (multiword units, such as przepraszamy za usterki, czwórka z Liverpoolu, lista Wildsteina etc.) were proposed by A. Bogusławski and their extensive development was carried out in W. Chlebda’s group (retrieval, collection, position of multi-word units in cultural contexts, reconstruction of authorship, semantic interpretation and even translation suggestions). Automatic methods can be divided into advanced statistical methods (n-gramming, see Yamamoto, Church 2001, Buczyński 2004) and “infantile” approaches (e.g. so-called quotation retrieval, cf., among others, Wierzchoń 2005). The present article puts forward a new method for the retrieval of idiomatic combinations applied to a specific text, that is, a reverse order index of translation units. The reverse index lists translation pairs made up of Russian and Polish language segments (words, phrases etc.) ordered according to the sequence of the Russian and Polish alphabet, respectively. Reproducibility of their components is searched for within the confinesof those pairs. The retrieval method in question to some extent uses ready (lexicographically compiled) multi-word units, the improvement being, however, the development of their presentation in order to enable relatively uncomplicated observation of the reproducibility of translation pair components.
More...