We kindly inform you that, as long as the subject affiliation of our 300.000+ articles is in progress, you might get unsufficient or no results on your third level or second level search. In this case, please broaden your search criteria.
Visualizing spoken corpus data on a map is an invaluable tool both at the stage of data collection (keeping track of numbers of speakers from different regions for corpus balancing purposes) and data exploration (examining the regional distribution of a sociolinguistic variable). Recently, a tool in this vein has been made available to Czech National Corpus users via the SyD application: a map summarizing the proportional usage of a given set of variants across the traditional dialect regions of Czech represented in the ORAL series corpora. The advantages of this new feature are discussed and examples highlighting how it can give an intuitive overview of dialectal variation are given. Current and future plans for other useful types of map-based visualizations of spoken corpus data are also presented.
More...
The language of Czech classical prose writers has been well characterised so far. Many Czech linguists have lately focused on the language of epistolography. Mainly, private letters drew their attention and this was due to (among other things) their immediality and close relation to the spoken language. Our project analyses the letters written by and addressed to Karel Havlíček, a Czech journalist and prose writer of the mid-19th century. A preliminary corpus consists of 548 transliterated letters (approximately 250.000 text words), which is a half of a total of more than 1.100 letters (of which there are about 350 in German and several others in further languages). The letters in foreign languages have to be translated for the edition in preparation. Search for an appropriate equivalent and its selection from a set of competing means inspire the linguist to carry out a research into the mid-19th c. Czech. This paper describes the usage of words selected out of the corpus, e.g. some kinds of conjunctions and particles, and reflects on the criteria for selection of appropriate equivalents for translation. Its wider task is a description of the mid-19th c. Czech based on reliable data and indication of some possibilities for further research.
More...
This paper works with data provided by the Czech National Corpus to consider the use of nepřizpůsobivý (inadaptable) by the Czech mainstream print media as a code word that is widely understood to signify a Roma citizen. The study shows that nepřizpůsobivý is used far more frequently in journalism than in other text genres and that its use has increased over the past decade. Examination of collocations reveals that nepřizpůsobivý typically is associated with negative reports on housing, residency and crime. This paper can also be seen as a case study to illustrate the usefulness of corpus data to critical discourse analysis and the role of the corpus in providing quantitative support to qualitative research in general.
More...
This article engages in polemic with two papers on the status and prospects of corpus linguistics that were recently published by two Czech linguists in the journal Naše řeč (Our Language). These linguists claim that corpus linguistics relies too heavily on description, in general, and doesn’t provide sufficiently rigorous explanations. In contrast, the present author argues that working with large corpora (billions of tokens) does not necessarily lead to mere descriptions of language phenomena. Rather, descriptions based on large corpora facilitate rigorous explanations of grammatical phenomena. In addition, the author argues that until data-based descriptions became an integral part of work in the natural sciences, philosophically based explanations did not fully succeed at enabling us to understand the physical world. Language is a part of the natural world, and satisfactory grammatical explanations of natural languages require much more empirical evidence than could be obtained in the past without electronic corpora. Several examples of empirical evidence and their critical relevance to linguistic analysis are cited.
More...
Monocollocable words are such words and word forms that occur in a single lexical combination only or in very few, whose number is severely restricted and set. Practically, they are found as parts of set idioms and multi-word terms. They are found in many other languages, cf. English tenterhooks or Russian bakluši. Czech examples dát/dostat najevo, na viděnou, je mi líto, říct/mluvit/hrát nahlas, je známo, je zapotřebí, být třešničkou na dortu, není divu, jít/chodit pěšky, dát/dostat zadarmo illustrate this in more detail, showing, at the same time, that there might be a limited variation found, too, but, above all, that these are, in fact, no full-fledged words, lacking most of their basic characteristics, such as meaning, word-class membership, etc. In the sense of their severely limited combinatorial capacity, these words, less known under such alternative labels as cranberry words, form a substantial and irregular periphery of language and its lexicon. The contribution briefly comments on some of their aspects and suggests that broadly some classes or types can be recognized.
More...
The study presents some corpus-based results about dispositional reflexive constructions in Czech. One type of Czech dispositional reflexives expresses the evaluation of a mental state with regard to an event associated with the predicate (Snad se mu žije lehčeji.). The evaluation of experiencing an underlying event in a particular way is usually encoded lexically by an adverbial phrase, the licensing conditions of which have not yet been investigated both within Western formal approaches (“manner adverb”, “some adverbial modification”) and within the Slavic grammatical tradition. Based on the analysis of corpus data, the formal (lexical, syntactic and prosodic) properties of this (explicit or implicit) adverbial argument will be examined. Then the frequency distribution of the adverbial argument (initial, preverbal, postverbal, final) within the clause will be established. Attention will also be given to the frequency of specific adverbs in both affirmative and negative clauses. Furthermore, an explanatory account of the adverbial selection in dispositional reflexives will be developed. When a reflexive verbal form is combined with an adverb (θ-identification), different main types of manner adverbs need to be distinguished, according to the semantic feature of ‘activation’ of the adverb [eval] and the θ-grid (<θ-Exp>, <θ- x>): obligatory evaluative (nerado ‘unwillingly‘), facultative evaluative (výborně ‘excellently‘) pragmatically evaluative (přirozeně ‘naturally‘) and non-evaluative adverbs (rozhořčeně ‘indignantly‘, rychle ‘quickly‘). Within the class of evaluative adverbs, one can further differentiate between primary evaluative adverbs (dobře ‘well‘) and secondary evaluative adverbs (vesele ‘cheerfully‘), according to their semantic structure. The behaviour of adverbs that do not define their pole as either positive or negative (normálně ‘normally‘, nějak ‘somehow‘), or those that can be tied contextually to both poles (zvláštně ‘oddly‘) will be outlined briefly. Finally, the proposed paper will discuss three distinct functions of the modifier sám, -a, -o (‘alone‘) in syntactic reflexives: autocausative, autoagentive and – last but not least – dispositional.
More...
It is only the aid of large corpora (several billions of words) that enables us to discover some intuitively and spontaneously followed rules of grammar. Different kinds of ellip-sis and non-ellipsis (repetition of a word or a nominal phrase, which – under some con-ditions – can be omitted) can also be governed by such rules. The corpus findings of sen-tence structures as (1) Zastavila se a podívala se na hodinky or (2) Zastavila se a podívala na hodinky (She stopped and looked at her watch) have clearly shown that el-lipsis as well as repetition is a (strict) rule under specific semantic and syntactic conditions.
More...
The article explores reported speech/thought in spoken Czech, especially reproductions introduced with various forms of říct/říkat (to say), with data provided by the Czech National Corpus. Most reproductions were introduced by the imperfective verb říkat (past and present tenses, first and third persons). By contrast, reproductions of thought were much less numerous and almost invariably involved the first person. We found twice as many examples of direct speech than indirect speech, and interesting transitional forms, some of which can be described as free indirect speech. Pauses separating introductory constructions from reproductions appear to be more typical of direct than indirect speech, but are generally infrequent, suggesting a lower degree of segmentation of spoken language. Sometimes, reproductions of the speech of others were signalled with reduced introductory constructions, with verba dicendi substituted by signals other than verbs, whereas reproductions of one’s own speech were normally introduced witha verbum dicendi.
More...
The present paper is a reply to the article Perspektivy korpusové lingvistiky: deskripce, nebo explanace by František Štícha (2015) which is a critique of recent studies by Radek Čech (2014) and Jan Chromý (2014). It is shown that Štícha’s argumentation is based on an inaccurate reading of the two criticized studies. Also, Štícha’s conception of corpus linguistics as a discipline which aims to capture the morphological and syntactical norm of well-educated people is rather limited. This narrow-minded view seems to be another reason of Štícha’s misunderstanding of the criticized papers.
More...
The goal of this paper is to provide an overview of the structure and contents of the soon-to-be available ORAL corpus, which combines previously published corpora (ORAL2006, ORAL2008 and ORAL2013) with newly transcribed material into a single conveniently accessible and more richly annotated resource, about 6 million running words in length. The recordings and corresponding transcripts span a decade between 2002 and 2011; most of them capture interactions of mutually well-acquainted speakers, in informal situations and natural settings. The corpus is complemented by a marginal portion of more formal data, mostly public talks. It is tagged and lemmatized, and an effort was made to adapt existing tools (targeted at written language) to yield better results on spoken data. We hope the availability of such a resource will spawn further discussions on the morphological and syntactic analysis of spoken language, perhaps resulting in more radical departures in the future from the part-of-speech classification inherited from the linguistic analysis of written language.
More...
The article deals with comparison of the letters which Karel Havlíček wrote during his sojourn in Russia (1843–1844) and the set of his journalistic sketches Pictures from Russia (1843–1845). The mutual relations, similarities and differences between these two corpuses of texts are described and analyzed. The letters include comments on sketches and also passages concerning the same topics as those of sketches. Some formulations are identical, but the letters tend, among others, to informal expression and to concentration on subjects of the addresser and addressee. Similarities between letters and sketches can be seen in the division of texts into three main components (observation, classification, evaluation), in the huge use of Russian words and utterances or in confronting the Russian, German and Czech social and cultural spheres. The specific features of letters are based on the communication characteristics of private correspondence (the well-known singular addressee, simple adding of pieces of information).
More...
In the focus of this contribution is the ortography of grammatical case forms of the personal pronoun já, namely the shapes of the genitive/accusative (mě/mě) and dative/local (mi, mně / mně). It is quite often mistaken in this phenomenon – it is written mně instead of mě and also mě instead of mně. The author considers the possible causes of this error. At first, the following three are listed: 1) homophony, 2) insufficient knowledge of ortographical rule, 3) misidentification of the case. Then he raises the question of whether the other reason for misspelling could be the tendency of the language users to analogously treat those units that are close to each other in the language system (in the case of pronouns, it is primarily pronoun ty). The author intends to indicate through the observed phenomenon (misspelling mě and mně) his opinion on the question of so-called linguistic correctness. The main purpose of the text is to show how (according to the author) it should be treated when someone expresses in inconsistency with codification and/or norm and/or our linguistic awareness. The essence of the linguistic work should be the search for causes in these cases.
More...
The necessity to distinguish between cognitive content and linguistic meaning arose in European structural linguistics (Saussure 1916) and was further elaborated in the Prague Linguistic Circle (Mathesius 1942; Dokulil – Daneš 1958; Daneš 1974). In the contribution, we describe the practical aspects of applying the principle of distinguishing meaning and content to the task of delimitation of adverbial meanings, expressed by prepositional groups. We present methodology and main principles we work with in completing the set of meanings of adverbials. We describe how we use the principle of substitutability of synonyms. All the examples in the contribution relate to spatial adverbials but the principles apply to adverbials in general. Our theoretical framework is Functional Generative Description (Sgall et al. 1986).
More...