Real-time audio transcriber for language barrier-free classrooms

Huiyu Liu; Gokul Srinivasagan; Munir Georges

Real-time audio transcriber for language barrier-free classrooms

Authors: Huiyu Liu, Gokul Srinivasagan, Munir Georges

Abstract:

Language barriers in educational environments pose significant challenges to international students and educators, particularly in real-time lecture transcription. While large-scale speech models like whisper demonstrate impressive capabilities, their deployment in resource-constrained settings remains challenging. This study develops a lightweight solution for real-time speech transcription and German-English translation through knowledge distillation and model compression techniques. By leveraging the whisper model to generate pseudo-labels and exploring various distillation strategies, we created compact models that maintain high performance while reducing computational demands. Our experiments show that a compressed model with approximately 40 million parameters achieves competitive word error rate (WER) and BLEU scores in both transcription and translation tasks. The resulting system, implemented using whisper.cpp, achieves real-time performance with a real-time factor (RTF) below 0.5 in CPU-only environments, effectively mitigating language barriers in classroom settings.

Year: 2025
In session: Multilingual Speech and Language Data Processing
Pages: 146 to 154