Speech intelligibility prediction with hybrid auditory model- and ML-based methods: The best of two worlds?

Abstract:

This contribution reviews the usage of a hybrid approach to model human speech recognition with an auditory-model-based frontend and a machine learning (ML)-based backend. The Framework for Auditory Discrimination Experiments (FADE [15]), for example, utilizes a physiology-inspired Gabor-Filter feature extraction in combination with an HMM/GMM ASR system as backend. Its performance in comparison to standard procedures is evaluated for predicting the speech recognition threshold (SRT), i.e., the signal-to-noise ratio corresponding to 50% sentence intelligibility for normal and hearing-impaired listeners in various interfering noise conditions. The results highlight the advantage of combining the best of two worlds, i.e., a model-based frontend to allow for an individualizations strategy for the respective perception task given any individual hearing impairment and a MLbased ASR backend as a “generalized optimum detector”.


Year: 2022
In session: Models
Pages: 17 to 23