Zakon o veličini vokabulara teksta. Heapsov zakon i određivanje veličine vokabulara tekstova na hrvatskom jeziku
The Text Vocabulary Size Law. Heaps' Law and Determining Text Vocabulary Size in Croatian Language
Author(s): Miroslav TuđmanSubject(s): Language and Literature Studies
Published by: Institut društvenih znanosti Ivo Pilar
Keywords: formula; vocabulary; texts;
Summary/Abstract: The existing formula / Vr(n)=Knß / of Heaps' Law regarding the size of a text's vocabulary is not universal, thus the law needs to be redefined, in order to be used for analysis of a different language corpus. The analysis of a corpus of texts in the Croatian language confirms the hypothesis that the number of functional items (F) in a text is constant and amounts to 21% of the size of the text n (there are 26% of functional items in English texts). The author proves that the percentage of functional items in a text can be used as the value for the parameter K, and that the parameter K presents a constant value for every language corpus. Empirical research has confirmed the author's thesis that the number of functional items in a text can be calculated according to the formula F=nK/100, and that for the value of the most frequent item (MF) the formula MF=n(K/100)2 can be applied. The value of the other parameter of Heaps' Law can also be accurately determined: ß=log K/100. The author therefore suggests a new form of the text vocabulary size law: Vr(n)=(Kn)ß.
Journal: Društvena istraživanja - Časopis za opća društvena pitanja
- Issue Year: 14/2005
- Issue No: 75+76
- Page Range: 227-250
- Page Count: 24
- Language: Croatian