Ensembles of Classifiers for Parallel Categorization of Large Number of Text Documents Expressing Opinions Cover Image

Ensembles of Classifiers for Parallel Categorization of Large Number of Text Documents Expressing Opinions
Ensembles of Classifiers for Parallel Categorization of Large Number of Text Documents Expressing Opinions

Author(s): František Dařena, Jan Žižka
Subject(s): Classification, Computational linguistics, ICT Information and Communications Technologies
Published by: Reprograph
Keywords: text documents; natural language; classification; parallel processing; ensembles of classifiers; machine learning;

Summary/Abstract: Opinions provided by people that used some services or purchased some goods are a rich source of knowledge. The opinion classification, applying mostly supervised classifiers, is one of the essential tasks. Computer’s technological capabilities are still a major obstacle, especially when processing huge volumes of data. This study proposes and evaluates experimentally a parallelism application to the classification of a very large number of contrary opinions expressed as freely written text reviews. Instead of training a single classifier on the entire data set, an ensemble of classifiers is trained on disjunctive subsets of data and a group decision is used for the classification of unlabeled items. The main assessment criteria are computational efficiency and error rates, combined into a single measure to be able to compare ensembles of different sizes. Support vector machines, artificial neural networks, and decision trees, belonging to frequently used classification methods, were examined. The paper demonstrates the suggested method viability when the number of text reviews leads to computational complexity, which is beyond the contemporary common PC’s capabilities. Classification accuracy and the values of other classification performance measures (Precision, Recall, F-measure) did not decrease, which is a positive finding.

  • Issue Year: XII/2017
  • Issue No: 47
  • Page Range: 25-28
  • Page Count: 4
  • Language: English