Principles of corpus querying: A discussion note

B&#225;lint Sass

Principles of corpus querying: A discussion note
Principles of corpus querying: A discussion note

Author(s): Bálint Sass
Subject(s): Lexis, Semantics
Published by: Akadémiai Kiadó
Keywords: corpus; query; concordance; filter; frequency list; context

Summary/Abstract: Nowadays, it is quite common in linguistics to base research on data instead of introspection. There are countless corpora – both raw and linguistically annotated – available to us which provide essential data needed. Corpora are large in most cases, ranging from several million words to some billion words in size, clearly not suitable to investigate word by word by close reading. Basically, there are two ways to retrieve data from them: (1) through a query interface or (2) directly by automatic text processing. Here we present principles on how to soundly and effectively collect linguistic data from corpora by querying i.e. without knowledge of programming to directly manipulate the data. What is worth thinking about, which tools to use, what to do by default and how to solve problematic cases. In sum, how to obtain correct and complete data from corpora to do linguistic research.

Details
Contents

Journal: Acta Linguistica Academica. An International Journal of Linguistics (Until 2016 Acta Linguistica Hungarica)

Issue Year: 69/2022
Issue No: 4
Page Range: 599-614
Page Count: 16
Language: English

Content File-PDF

Back to list

Principles of corpus querying: A discussion note Principles of corpus querying: A discussion note

Principles of corpus querying: A discussion note
Principles of corpus querying: A discussion note