ESSV Konferenz Elektronische Sprachsignalverarbeitung

Title: Speaker Gender Classification based on an Improved Deep Learning Approach

Authors: Mohamed Anouar Ben messaoud, Aicha Bouzid


With the great evolution of technology, Speaker gender and age classification is one of the major problems for large range of applications in speech analysis and recognition. The identification of speakers has become crucial in the cases of criminal suspect, speech recognition, speech emotion, and computer-aided physiological. To improve the accuracy of speaker gender classification, we must generate robust features with a depth classifier. With the promising results giving by machine learning for classification problem, our approach has taken advantage of deep learning. In this paper, we propose to apply a speaker gender classification based on the Recurrent Neural Network (RNN) which is able to determine the long term dependencies of a sequential speech signal. The most popular RNN is Long Short-Term Memory (LSTM) model. However, it has a complex design which makes it difficult to implement. So, we refine the LSTM model to our proposed Simplified Gated Recurrent Units (SGRUs) by using an efficient architecture with only two multiplicative gates more suitable for speech classification. Our approach is decomposed into two essential steps. First, we generate the features from our model train based on SGRUs by removing of reset gates to limit redundancy and reduce the number of parameters without affect the system performance. Second, we use the Rectified Linear Units (ReLU) activations to learn long-term dependencies without slow down the training process. In Our architecture, we modify the level of dropout and increase the depth of the network. The architecture was tried on a public challenging database. Experiment results show that our approach presenting a high accuracy surpassing other recent methods of gender classification task.

Year: 2020
In session: Poster
Pages: 193 to 198