Analiza porównawcza miar podobieństwa tekstów opartych na macierzy częstości i tekstów opartych na wiedzy dziedzinowej
Comparative analysis of text documents similarity measures based on frequency matrix and based on domain knowledge
Author(s): Janusz Tuchowski, Katarzyna WójcikSubject(s): Economy
Published by: Wydawnictwo Uniwersytetu Ekonomicznego we Wrocławiu
Keywords: text mining; similarity; measure; frequency matrix; ontology
Summary/Abstract: The main objective of this paper is an attempt of evaluation of usefulness of similarity measures of text documents. Mostly known from literature are the ones based on frequency matrix and those based on domain knowledge represented by ontologies. Firstly the documents that were used in the research are presented. Secondly, chosen measures based on frequency matrix are shortly described. To summarize the first part the simulation analysis based on those measures is presented. Next part of the article is devoted to the results of a simulation analysis achieved when measures based on ontologies are used. On this basis an attempt of evaluation of usefulness of similarity measures of texts is made.
Journal: Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu
- Issue Year: 2012
- Issue No: 242
- Page Range: 396-405
- Page Count: 10
- Language: Polish