The impact of mismatched recordings on an automatic-speaker-recognition system and human listeners
The impact of mismatched recordings on an automatic-speaker-recognition system and human listeners
Author(s): Tomáš Nechanský, Tomáš Bořil, Alžběta Houzar, Radek SkarnitzlSubject(s): Language and Literature Studies
Published by: Univerzita Karlova v Praze, Nakladatelství Karolinum
Keywords: forensic voice comparison; temporal mismatch; language mismatch; automatic speaker recognition; voice parade
Summary/Abstract: The so-called ‘mismatch’ is a factor which experts in the forensic voice comparison field encounter regularly. Therefore, we decided to explore to what extent the automatic-speaker-recognition system’s and the earwitness’ ability to identify speakers is influenced when recordings are acquired in different languages and at different times. 100 voices in a database of 300 recordings (100 speakers recorded in three mutually mismatched sessions) were compared with an automatic-speaker-recognition software VOCALISE based on i-vectors and x-vectors, and by 39 respondents in simulated voice parades. Both the automatic and perceptual approach seem to have yielded similar results in that the less complex the mismatch type, the more successful the identification. The results point to the superiority of the x-vector approach, and also to varying identification abilities of listeners.
Journal: Acta Universitatis Carolinae Philologica
- Issue Year: 2022
- Issue No: 1
- Page Range: 11-22
- Page Count: 12
- Language: English