Ke klasifikaci morfologických variant
On the classification of morphological variants
Author(s): Václav Cvrček, Vilém KodýtekSubject(s): Language and Literature Studies
Published by: AV ČR - Akademie věd České republiky - Ústav pro jazyk český
Keywords: corpus analysis; effect size; language production heterogeneity; morphological variation; Shannon entropy; statistical analysis
Summary/Abstract: After briefly discussing the heterogeneities inherent to language production and how they influence corpus evidence, we describe a scale for the classification of individual morphological variants by their relative frequencies that has recently been independently proposed in <i>Mluvnice současné češtiny</i> (2010) (A Grammar of Contemporary Czech, hereafter <i>GCCz</i>), of which we are co-authors, and in Bermel & Knittl (2012). Those variants with relative frequency (roughly) within 1% and 10% are classified by the respective authors as “sparse” and “marked”, and those occurring in (roughly) less than 1% cases as “unexpected” and “isolated”. Another feature of the scale is the “equipollence” of variants of a doublet having relative frequencies within (roughly) 1/3 and 2/3 (for this criterion see also Štícha 2009). The scale in <i>GCCz</i> is heuristically based on Shannon entropy and valid for synchronic functionally equivalent variants. Recently, R. Čech (2012) has claimed to have revealed “a serious statistical deficiency” in <i>GCCz</i>. We show that this is a misunderstanding stemming from his not distinguishing between the null-hypothesis statistical significance testing and the effect size evaluation. We end with a brief note on the structureof the resources employed in <i>GCCz</i>.
Journal: Slovo a slovesnost
- Issue Year: 74/2013
- Issue No: 2
- Page Range: 139-145
- Page Count: 7
- Language: Czech