An open stylometric system based on multilevel text analysis
An open stylometric system based on multilevel text analysis
Author(s): Maciej Eder, Maciej Tomasz Piasecki, Tomasz WalkowiakSubject(s): Semantics, Computational linguistics, Western Slavic Languages
Published by: Instytut Slawistyki Polskiej Akademii Nauk
Keywords: stylometry; Polish; CLARIN-PL; research infrastructure; language technology;
Summary/Abstract: Stylometric techniques are usually applied to a limited number of typical tasks, such as authorship attribution, genre analysis, or gender studies. However, they could be applied to several tasks beyond this canonical set, if only stylometric tools were more accessible to users from different areas of the humanities and social sciences. This paper presents a general idea, followed by a fully functional prototype of an open stylometric system that facilitates its wide use through to two aspects: technical and research flexibility. The system relies on a server installation combined with a web-based user interface. This frees the user from the necessity of installing any additional software. At the same time, the system offers a variety of ways in which the input texts can be analysed: they include not only the usual lexical level, but also deep-level linguistic features. This enables a range of possible applications, from typical stylometric tasks to the semantic analysis of text documents. The internal architecture of the system relies on several well-known software packages: a collection of language tools (for text pre-processing), Stylo (for stylometric analysis) and Cluto (for text clustering). The paper presents: (1) The idea behind the system from the user’s perspective. (2) The architecture of the system, with a focus on data processing. (3) Features for text description. (4) The use of analytical systems such as Stylo and Cluto. The presentation is illustrated with example applications.
Journal: Cognitive Studies | Études cognitives
- Issue Year: 2017
- Issue No: 17
- Page Range: 1-26
- Page Count: 26
- Language: English