An Improved Thresholding Function and Sparse Subspace decomposition for Speech Enhancement and its Application to Speech Recognition


In this work, we propose an unsupervised monaural Arabic speech enhancement method that is based on two different techniques. The main idea is to determine an exact threshold value in the wavelet domain depending on the voicing state of the Arabic speech signal. Our proposed voiced/unvoiced decision algorithm based on the Multi-scale Product (MP) analysis is used. The MP is based on the multiplication of wavelet transform coefficients at three successive dyadic scales. Then, we apply a denoising technique based on the thresholding of the discrete wavelet transform coefficients. The threshold values change either when the signal is voiced or unvoiced. Further, a subspace decomposition-based post-processing technique is implemented. The Fast Fourier Transform (FFT) of the obtained frames is decomposed into three subspaces: sparse, low rank, and the remainder noise components. Experimental results show that the proposed approach outperforms the compared speech enhancement methods for noise-corrupted Arabic speech at low levels of SNR. Beside, we present the evaluation results for automatic recognition on enhanced Arabic speech signal. We reconstitute the clean Arabic speech from noisy observations based on a sparse imputation technique. It employs a non-parametric model and finding the sparsest combination of exemplars that jointly approximate the reliable features of a noisy Arabic utterance.

Year: 2017
In session: Poster
Pages: 50 to 57