Resynthesizing the GECO speech corpus with VocalTractLab


We are addressing the challenge of learning an inverse mapping betweenacoustic features and control parameters of a vocal tract simulator. As a first step,we synthesize an articulatory corpus consisting of control parameters and waveforms using VocalTractLab (VTL; [1]) as the vocal tract simulator. The basis forthe synthesis is a concatenative approach that combines gestures of VTL accordingto a SAMPA transcription. SAMPA transcriptions are taken from the GECO corpus[2], a spontaneous speech corpus of southern German. The presented approach usesthe duration of the phones and extracted pitch contours to create gesture files for theVTL. The resynthesis of the GECO corpus results in 53960 valid spliced out wordsamples totalling in 6 hours and 23 minutes of synthesized speech. The synthesisquality is mediocre. We believe that the synthesized samples resemble some of thenatural variability found in natural human speech.

Year: 2019
In session: Sprachsynthese
Pages: 95 to 102