The Effect of Emotional Speech on Relative Speaker Discrimination

Abstract:

Text-independent speaker discrimination (SD) involves checking whether two arbitrary speech signals are uttered by the same speaker or two dif- ferent speakers. It has various applications such as speaker verification or speech turn segmentation. However, emotionally colored speech introduces variations in the acoustic features impairing the performance of baseline speech technologies. This study focuses on investigating the influence of emotions on SD, applying an approach based on a relative characterization of the speaker, called Relative Speaker Characteristic (RSC). The intrinsic variability is modeled by using emo- tional utterances represented in the benchmark corpus Berlin Database of Emo- tional Speech. Three feature subsets based on Mel Frequency Cepstral Coeffi- cients (MFCCs) are used to calculate the RSC that represents the SD specific information, namely F1 = {13 MFCCs}, F2 = {F1 ∪ delta coefficients} and F3 = {F2 ∪ delta−delta coefficients}. Emotionally neutral utterances serve as training data. SD models are developed using a Support Vector Machine with a linear kernel. By using the RSC that is based on F1, the best SD performance is achieved. Regarding F1, the SD performance for utterances in the state of joy (EER= 6.6%), boredom (EER= 6.69%) and anger (EER= 7.61%) is similar to the SD for emotionally neutral utterances (EER= 7.34%). However, for utter- ances in the state of fear (EER = 10.91%), disgust (EER= 23.76%) and sadness (EER= 25.76%), the SD performance is unreliable.


Year: 2018
In session: Affective Speech
Pages: 216 to 223