On the Optimal Set of Features and the Robustness of Classifiers in Radar-based Silent Phoneme Recognition


Silent speech recognition (SSR) is an active area of research with applications ranging from speech restoration to speech enhancement. Radar-based SSR has been proposed and investigated as a non-invasive method to infer vocal tract states and articulatory movements from measured changes in scattering parameters. One of the challenges in developing a radar-based SSR system is to determine the optimal set of features from these measurements. In this study, we therefore investigated the following problems: (a) The selection of the features that play the most significant role for classification. (b) The determination of the contribution of each reflection and transmission spectrum and the most important frequencies. (c) The determination of the performance of the classifiers when using fewer features. (d) The determination of the robustness of the classifiers against different noise levels. The data used in this study consisted of 230 samples of 25 German phonemes (15 vowels, each in 10 contexts, and 10 consonants, each in 8 contexts) produced by two German native speakers. Using the full feature set, a Linear Discriminant Analysis (LDA) classifier achieved up to 94 % classification accuracy for speaker 1 and 84 % for speaker 2. Using only the most important features as identified by a decision tree, the classification accuracy deteriorated slightly in most conditions, but in one case improved the accuracy from 73.5 % to 81 %. Regarding the robustness against noise, the accuracy of the LDA dropped sharply with increasing noise levels, while the decrease of the SVM’s accuracy was less steep.

Year: 2021
In session: Automatische Spracherkennung
Pages: 112 to 119