ESSV Konferenz Elektronische Sprachsignalverarbeitung

Title: Alignment between rigid head movements and prosodic landmarks

Authors: Angelika Hönemann, Hansjörg Mixdorff, Sascha Fagel


In our study we recorded and analyzed an audiovisual speech corpus to develop a model which predicts head and facial non-verbal movements accompanying speech. The model is intended to improve the naturalness of avatars. Our previous paper already gives a preliminary analysis of our speech corpus which includes acoustic and visual recordings of seven individual speakers who talk about three minutes about their last vacation. We showed that for each speaker 20-30% of events in each motion class are aligned with prominent syllables in phrase-initial or -medial position and that the speakers moved most often at the end of an intonation phrase. We also observe that the speakers differ in strength and frequency of visible events. However, there is also a great ratio of about 60% of motion events which are not assigned to the target syllables. In order to account for this result, further analyses had to be carried out. The present paper shows further analyses of the relationship between speech and movements. Therefore, we extracted the fundamental frequency (FO) and the intensity of the acoustic signals using Praat. By marking the prominent syllables we obtained a description of the course of FO. We use the Principle Component Analysis (PCA) to determine the linear combinations of the visual parameters that constitute the main head movements.

Year: 2013
In session: Prosodischer und multimodaler Ausdruck in der Mensch-Maschine Interaktion
Pages: 181 to 188