The impact of mismatched recordings on an automatic-speaker-recognition system and human listeners Cover Image

The impact of mismatched recordings on an automatic-speaker-recognition system and human listeners
The impact of mismatched recordings on an automatic-speaker-recognition system and human listeners

Author(s): Tomáš Nechanský, Tomáš Bořil, Alžběta Houzar, Radek Skarnitzl
Subject(s): Language and Literature Studies
Published by: Univerzita Karlova v Praze, Nakladatelství Karolinum
Keywords: forensic voice comparison; temporal mismatch; language mismatch; automatic speaker recognition; voice parade

Summary/Abstract: The so-called ‘mismatch’ is a factor which experts in the forensic voice comparison field encounter regularly. Therefore, we decided to explore to what extent the automatic-speaker-recognition system’s and the earwitness’ ability to identify speakers is influenced when recordings are acquired in different languages and at different times. 100 voices in a database of 300 recordings (100 speakers recorded in three mutually mismatched sessions) were compared with an automatic-speaker-recognition software VOCALISE based on i-vectors and x-vectors, and by 39 respondents in simulated voice parades. Both the automatic and perceptual approach seem to have yielded similar results in that the less complex the mismatch type, the more successful the identification. The results point to the superiority of the x-vector approach, and also to varying identification abilities of listeners.

  • Issue Year: 2022
  • Issue No: 1
  • Page Range: 11-22
  • Page Count: 12
  • Language: English
Toggle Accessibility Mode