Tuning Sphinx to outperform Google’s speech recognition API

Abstract:

In this paper, we investigate whether the open-source speech recognizer Sphinx can be tuned to outperform Google’s cloud-based speech recognition API in a spoken dialog system task. According to this target domain, we use data from CMU’s Let’s Go bus information system comprising 258k utterances of telephony speech recorded in the bus information dialog system of Pittsburgh. By training a domain-specific language model on the aforementioned corpus and tuning a number of Sphinx’s parameters, we achieve a WER of 51.2%. This result is significantly lower than the one produced by Google’s speech recognition API whose language model is built on millions of times more training data.


Year: 2014
In session: Spracherkennung
Pages: 32 to 41