Automatic vocal tract segmentation based on conditional generative adversarial neural network

Abstract:

Speech production is characterized by high articulatory variability bothin the space and the time domains. MRI in the last decades and realtime MRImore recently have proved to be particularly adapted to study the speech articulations. The data generated require however a high amount of processing but thequantity generated exclude manual processing and call for automatic segmentationmethods. Nowadays, deep learning shows very promising results in many aspectsof image processing problems including segmentation. In this paper, the segmen-tation of the jaw, the tongue and the vocal tract are explored based on a modifiedversion of the pix2pix algorithm, taking advantage of the conditional generativeadversarial networks. The experimental results are evaluated via a leave-one-outcross-validation scheme on midsagittal static MRI images of 10 subjects sustaining 62 different articulations. Both qualitative and quantitative assessments of theproposed method show promising and reliable performance and open the way forpossible future works in speech articulatory modelling.


Year: 2019
In session: Poster und Demonstrationen
Pages: 263 to 270