Implementing Easy-to-Use Recipes for the Switchboard Benchmark

Abstract:

We report on our contribution of templates for tokenization, language modeling, and automatic speech recognition (ASR) on the Switchboard benchmark to the open-source general-purpose toolkit SpeechBrain. Three recipes for the training of end-to-end ASR systems were implemented. We describe their model architectures, as well as the necessary data preparation steps. The word error rates achievable with our models are comparable to or better than those of other popular toolkits. Pre-trained ASR models were made available on HuggingFace. They can be easily integrated into research projects or used directly for quick inference via a hosted inference API.


Year: 2023
In session: Automatic Speech Recognition
Pages: 150 to 157