ESSV Konferenz Elektronische Sprachsignalverarbeitung

Title: Perceptual quality dimensions of Text-to-Speech systems in audiobook reading tasks

Authors: Florian Hinterleitner, Christoph Norrenbrock, Sebastian Möller


In this paper we present research on perceptual quality dimensions of text-to-speech systems in audiobook reading tasks. Therefore, we proposed a newly developed evaluation protocol for the assessment of synthetic speech in audiobook reading tasks for the Blizzard Challenge 2012. We illustrate the experimental setup of the special audiobook reading task of the Blizzard Challenge 2012 and analyze and interpret the results of the subjective listening test. Via a factor analysis, two quality dimensions could be extracted. Through the correlation between the val- ues of the rating scales and the factor values, the dimensions could be assigned to prosody & rhythm and to the listening pleasure of the user. This confirms the results of the previous study in which the current evaluation protocol was created. Also, a comparison with the perceptual quality dimensions of text-to-speech systems in different use cases led to significant similarities.

Year: 2013
In session: Sprachsynthese
Pages: 44 to 49