ESSV Konferenz Elektronische Sprachsignalverarbeitung

Title: Analysis and categorization of corrections in multilingual spoken dialogue system

Authors: Ivan Kraljevski, Diane Hirschfeld


Human-machine conversation imposes many challenges where communications errors are still ultimately unavoidable. It is of great importance to facilitate the detection and correction of miscommunication. A robust dialogue system has to be able to detect miscommunication and to apply appropriate recovery and error strategies. This is only possible if the system is capable of being aware of any problematic communication by analyzing and classifying correction dialogue acts.The speaking style changes, associated with corrections, are characterized by distinctive prosodic features. They are mostly correlated with hyperarticulated speech, which can be used as a clue to identify problematic situations. In this paper we analyzed, categorized and detected distinctive acoustic-prosodic features of corrections on 13 different languages. The statistical analysis showed that there is a significant relationship to the language and the type of correction with the features related to hyperarticulated speech. In general, speakers raised their voice in the case of a request to repeat the last utterance, but they did the opposite in the case of insertions, also the speech rate was slower in misrecognition clarifications. Additionally, we presented the results of classification experiments of corrections exploiting acoustic-prosody feature analysis in combination with machine learning. The datasets are characterized by a small number of unbalanced classes and a small amount of training data per class. Support Vector Machines and Artificial Neural Networks were employed for the multi-class and binary classification. The results were analyzed and compared in terms of unweighted accuracy, precision, recall, and F1 score.

Year: 2019
In session: Dialogsysteme
Pages: 50 to 57