Success Rates in Most-frequent-word-based Authorship Attribution. A Case Study of 1000 Polish Novels from Ignacy Krasicki to Jerzy Pilch
Success Rates in Most-frequent-word-based Authorship Attribution. A Case Study of 1000 Polish Novels from Ignacy Krasicki to Jerzy Pilch
Author(s): Jan RybickiSubject(s): Language studies, Language and Literature Studies, Theoretical Linguistics, Applied Linguistics
Published by: Wydawnictwo Uniwersytetu Jagiellońskiego
Keywords: multivariate analysis; authorship attribution; Polish literature; stylometry
Summary/Abstract: The success rate of authorship attribution by multivariate analysis of most-frequent-word frequencies is studied in a 1000-novel corpus of Polish literary works from the late 18th to the early 21st century. The results are examined for possible influences of the number of authors and/or the number of texts to be attributed. Also, the success rates achieved in this study are compared to those obtained in earlier studies for smaller corpora, too small perhaps to produce regular patterns. This study shows that text sets of this size confirm the intuitive predictions as to those influences: 1) the more authors, the less successful attribution; 2) for the same number of authors, the number of texts to be attributed does not influence success rate.
Journal: Studies in Polish Linguistics
- Issue Year: 10/2015
- Issue No: 2
- Page Range: 87-104
- Page Count: 18
- Language: English