Applying the speaking rate in a hierarchical classifier for emotion recognition from speech

Abstract:

Humans can easily estimate the rate of speech of a dialog partner in a conversation. Hence, the speaking rate can be regarded as a quite obvious prosodic characteristic of human speech. In particular, it provides information on different emotional dispositions of our dialog partners. However, most machines still lack of such human abilities and therefore research activities have started to focus stronger also on the emotional aspect of speech. In this paper we introduce a hierarchical classifier for emotions from speech. In a two step approach first a binary classification in low and high arousal emotions takes place on basis of the speaking rate feature. Afterwards, a second classification step determines the actual emotion. The hierarchical classifier consists of three Multi-Layer Perceptrons (MLP) trained on cepstral turn-level features, while the speaking rates are determined by applying a broad phonetic class recognizer. We present the results on the emotionally expressive EMO-DB corpus and compare them with results from a single MLP representing a flat approach with no hierarchical structure. An increase of accuracy up to 3.0% in certain emotion categories is reported.


Year: 2012
In session: Postersitzungen
Pages: 228 to 235