Parameter Optimization for Administration-Specific Speech Transcription with the Faster Whisper System
Authors: Robin Bitterlich, Paul Böhm, Oliver Jokisch
Abstract:
Automatic speech recognition (ASR) is gaining an increasing importance in the public administration, particularly to support the creation of meeting minutes. Previous research has shown that open-source models such as Whisper [1] are suitable for processing administration-specific speech data [2]. However, variations exist in terms of runtime performance, accuracy, and resource efficiency [3]. Building on these findings, this study investigates Faster Whisper, an optimized ASR engine that allows the targeted adjustment of inference parameters to balance quality and efficiency. The focus lies on the parameters beam size, patience and batch size, whose influence on transcription quality (word error rate, WER) and runtime efficiency (real-time factor, RTF) is systematically analyzed. The objective is to identify parameter configurations that achieve an optimal trade-off between accuracy and speed, thereby supporting a practical and privacy-compliant use of an automatic transcription system in resource- constrained administrative environments [4].


