Adaptation of Frequency Band Influence for Non-Native Speech Recognition

Abstract:

For voice controlled car navigation systems, multilinguality is a big challenge. The goals are clear. Users drive to other countries and need to enter foreign city names, at the same time it is likely that they will keep interacting in their native language for other commands. One important aspect is that the utterances the users produce differ from native speaker utterances, they have a non-native accent. The motivation for our work is that people hear better at low frequencies and know that low frequencies are more important for producing understandable utterances in the foreign language. Therefore they first aim to copy the low frequency behavior of the foreign language. Additionally, changes in mid to high frequencies are caused by little tongue movements. These subtle changes are hard to control for non-native speakers. Together both reasons cause the effect that non-native speech differs stronger from native speech for mid-range frequencies. Thus we analyze if speech recognition of non-native speakers can be improved by lowering the influence of mid to high frequencies. We achieve this through increasing some variances of the Gaussians. This leads to an reduced influence of differences in the corresponding frequency band on the likelihood output of a Gaussian. This way we can model the selective mismatch between native training data and non-native test data.


Year: 2008
In session: Spracherkennung
Pages: 149 to 156