Recognition of audio-visual attitudes
Authors: Phrashant Khatri, Hansjörg Mixdorff, Preeti Rao, Albert Rilliard
Abstract:
We investigate audiovisual features for classifying attitudes, a key aspect of communication that has been less studied than emotion recognition. Using a German audiovisual dataset labeled with speaker intention and perceived attitude, we test acoustic and visual features that have achieved state-of-the-art emotion recognition results. Our classification achieves performance significantly above chance for 16 attitudes, closely aligning with perceptual ratings in diversity across attitudes and speakers. We emphasize the challenges of processing nuanced expressions compared to prototypical emotions. While audiovisual classifications outperform humans in some areas, they fall short of fully leveraging the combined strengths of audio and visual cues. This study highlights the potential for improved cross-modal fusion and calls for further research on visual feature extraction in affective studies.