ESSV Konferenz Elektronische Sprachsignalverarbeitung

Title: Multi-condition Training and Adaptation for Noise Robust Speech Recognition

Authors: Ivan Kraljevski, Frank Duckhorn, Matthias Wolff, Rüdiger Hoffmann


In this paper, we investigated the recognition performance on speech distorted by noise with unknown characteristics, as found in many realworld situations. Speech recognition performance is notably degraded when used in conditions with mismatched test and training speech data. Various methods are proposed to overcome this problem for wide range of speech, speaker, channel and environmental conditions. The investigations presented in this paper, uses the UASR (Unified Automatic Speech Recognition and Synthesis) system to create and compare acoustic models regarding the noise robustness: model trained on clean speech data, model trained with multicondition (MC) noisy data and clean model adapted on (MC) noisy speech data. The noise robustness of the models was investigated by phoneme recognition on speech data with added noise of certain type and SNR levels as well as noise of unseen characteristics. It was shown, as it was expected, that there is significant recognition performance degradation for the clean speech model, while the MC trained model achieved the best possible recognition performance compared to others. It was observed also, that the adapted model could be successfully used for noisy speech recognition without the need of large amount of adaptation data.

Year: 2012
In session: Spracherkennung
Pages: 73 to 80