Comparative analysis of text documents similarity measures based on frequency matrix and based on domain knowledge Cover Image

Analiza porównawcza miar podobieństwa tekstów opartych na macierzy częstości i tekstów opartych na wiedzy dziedzinowej
Comparative analysis of text documents similarity measures based on frequency matrix and based on domain knowledge

Author(s): Janusz Tuchowski, Katarzyna Wójcik
Subject(s): Economy
Published by: Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu
Keywords: text mining; similarity; measure; frequency matrix; ontology

Summary/Abstract: The main objective of this paper is an attempt of evaluation of usefulness of similarity measures of text documents. Mostly known from literature are the ones based on frequency matrix and those based on domain knowledge represented by ontologies. Firstly the documents that were used in the research are presented. Secondly, chosen measures based on frequency matrix are shortly described. To summarize the first part the simulation analysis based on those measures is presented. Next part of the article is devoted to the results of a simulation analysis achieved when measures based on ontologies are used. On this basis an attempt of evaluation of usefulness of similarity measures of texts is made.

  • Issue Year: 2012
  • Issue No: 242
  • Page Range: 396-405
  • Page Count: 10
  • Language: Polish