ESSV Konferenz Elektronische Sprachsignalverarbeitung

Title: Filtering-Based Analysis of Spectral and Temporal Effects of Room Modes on Low-Level Descriptors of Emotionally Coloured Speech

Authors: Martin Gottschalk, Juliane Höbel-Müller, Ingo Siegert, Jesko L. Verhey, Andreas Wendemuth


Emotion recognition in far-field speech is challenging due to various acoustic factors. The present contribution especially considers dominant lowfrequency room modes which are often found in small rooms and cause variations in the low-frequency acoustical response at various listening locations. The impact of this spatial variation on low-level descriptors, used for feature sets in speech emotion recognition, has not been analysed in detail so far. This shortfall will be addressed in this paper, by utilising the well-known benchmark dataset EMO-DB providing emotionally coloured speech of high quality. The measured room response of a speaker cabin is compared with artificial approximations of its frequency response in the low frequency range. Two techniques were applied to obtain the approximations: The first technique uses multiple resonant filters in the low frequency region, whose parameters are determined by a leastsquares fit. The second technique used a modified version of the cabin’s amplitude spectrum, that was set to unity for higher frequencies and transformed to minimum phase and to time domain. To be able to identify the impact of room modes on the low-level descriptors, correlation coefficients between the “clean” and modified EMO-DB utterances are calculated and compared to each other. Furthermore, a speech emotion recognition system is used to identify the impact on the recognition performance.

Year: 2020
In session: Poster
Pages: 219 to 226