Somatosensory Feedback in PAULE

Abstract:

A somatosensory pathway is added to the Predictive Articulatory speech synthesis Utilizing Lexical Embeddings (PAULE) model. The different choices that lead to the specific somatosensory representation and the pathway are discussed. PAULE is a continuously improved control model for the articulatory speech synthesizer VocalTractLab (VTL) that directly facilitates a meaning representation to find suitable motor trajectories and does not use any symbolic units neither for the motor representation nor for the acoustic or semantic representation. The somatosensory representation consists of the minimal cross-sectional area in each of the most frontal 1-centimeter intervals of the oral cavity of the VTL plus the incisor position, the tongue tip elevation, and the velum opening. In the somatosensory pathway the 10-dimensional somatosensory representation is used as an intermediate representation before predictions in the acoustic and semantic goal space are compared against targets. The semantic and acoustic sources of error along the somatosensory and along the acoustic pathway are added together with an effort minimization term on the control parameter (cp-)trajectories of the VTL to form an additive loss. This additive loss is minimized to plan optimal cp-trajectories that result in a copy-synthesis of a target acoustics with the VTL.


Year: 2023
In session: Speech Pathology
Pages: 119 to 126