The Text Vocabulary Size Law. Heaps' Law and Determining Text Vocabulary Size in Croatian Language Cover Image

Zakon o veličini vokabulara teksta. Heapsov zakon i određivanje veličine vokabulara tekstova na hrvatskom jeziku
The Text Vocabulary Size Law. Heaps' Law and Determining Text Vocabulary Size in Croatian Language

Author(s): Miroslav Tuđman
Subject(s): Language and Literature Studies
Published by: Institut društvenih znanosti Ivo Pilar
Keywords: formula; vocabulary; texts;

Summary/Abstract: The existing formula / Vr(n)=Knß / of Heaps' Law regarding the size of a text's vocabulary is not universal, thus the law needs to be redefined, in order to be used for analysis of a different language corpus. The analysis of a corpus of texts in the Croatian language confirms the hypothesis that the number of functional items (F) in a text is constant and amounts to 21% of the size of the text n (there are 26% of functional items in English texts). The author proves that the percentage of functional items in a text can be used as the value for the parameter K, and that the parameter K presents a constant value for every language corpus. Empirical research has confirmed the author's thesis that the number of functional items in a text can be calculated according to the formula F=nK/100, and that for the value of the most frequent item (MF) the formula MF=n(K/100)2 can be applied. The value of the other parameter of Heaps' Law can also be accurately determined: ß=log K/100. The author therefore suggests a new form of the text vocabulary size law: Vr(n)=(Kn)ß.

  • Issue Year: 14/2005
  • Issue No: 75+76
  • Page Range: 227-250
  • Page Count: 24
  • Language: Croatian