Principles of corpus querying: A discussion note Cover Image

Principles of corpus querying: A discussion note
Principles of corpus querying: A discussion note

Author(s): Bálint Sass
Subject(s): Lexis, Semantics
Published by: Akadémiai Kiadó
Keywords: corpus; query; concordance; filter; frequency list; context

Summary/Abstract: Nowadays, it is quite common in linguistics to base research on data instead of introspection. There are countless corpora – both raw and linguistically annotated – available to us which provide essential data needed. Corpora are large in most cases, ranging from several million words to some billion words in size, clearly not suitable to investigate word by word by close reading. Basically, there are two ways to retrieve data from them: (1) through a query interface or (2) directly by automatic text processing. Here we present principles on how to soundly and effectively collect linguistic data from corpora by querying i.e. without knowledge of programming to directly manipulate the data. What is worth thinking about, which tools to use, what to do by default and how to solve problematic cases. In sum, how to obtain correct and complete data from corpora to do linguistic research.

  • Issue Year: 69/2022
  • Issue No: 4
  • Page Range: 599-614
  • Page Count: 16
  • Language: English
Toggle Accessibility Mode