Perception of Formant Distortion in German Words and Non-words

Abstract:

Concatenative text-to-speech (TTS) systems remain a widely used cheaper alternative to neural TTS systems. Yet concatenation of prerecorded units entails some drawbacks, such as spectral distortion, the perceptual consequences of which remain unclear. In an attempt to bridge this gap, our study focused on the effect of spectral distortion in vowel formants on perceived speech quality in naturally-read manipulated German words as well as non-words. More specifically, we explored the distortion effect on a varying number of affected formants, at different magnitude and directionality in two corner vowels /a:/ and /i:/. The results indicate that single formant manipulations have a less pronounced effect on the listeners’ perception compared to multiple formant perturbations. The threshold at which the distortion became generally audible was estimated to lie between 0.4 and 1.0 bandwidth. The directionality of the distortion was not found to be significant.


Year: 2024
In session: Phonetische Untersuchungen
Pages: 46 to 53