Towards a better understanding of TTS Synthesis: Subjective quality and its instrumental assessment

Abstract:

The purpose of this contribution is to give new insights into instrumentalquality estimation of text-to-speech (TTS) signals. Two main aspects are in our focus:(1) What makes up the subjective quality of TTS signals from the native listener perspective? (2) How can the subjective quality be measured instrumentally? Regardingthe first question typical impairments in TTS signals are identified based ona newly assembled German auditory test database comprising 14/15 state-of-the-art. TTS systems for female/male voices. The results of a full semantic differential aredescribed with emphasis on the potential to describe the quality space by means ofa small number of quality dimensions. The second question addresses the developmentof a suitable feature set for signal-based estimation of subjective quality. Wetake up the idea of auditory-inspired modulation features which have been shownto represent most of the articulatory information conveyed in speech signals. Thepotential for robust instrumental quality diagnosis is discussed.


Year: 2011
In session: Sprachsynthese-Evaluation und Prosodie
Pages: 91 to 98