ESSV Konferenz Elektronische Sprachsignalverarbeitung

Title: The computational architecture of Elija: a model of a young child that learns to pronounce

Authors: Ian S. Howard, Piers Messum


We describe the architecture and operation of Elija, a computational infant that learns to pronounce speech sounds. Elija is modelled as an agent who can interact with his environment but who has no a priori articulatory or perceptual knowledge of speech. His sensory system responds to touch and acoustic input. He judges the value of action and response using a reward mechanism, and can associate and remember the correspondences between his actions, their reward, and prior and subsequent sensory inputs. Elija first develops the ability to babble using unsupervised learning, which is formulated as an optimization problem. Then he takes advantage of tutored interactions with his caregivers. Such interactions consist of naturalistic exchanges in which the caregivers reformulate Elija’s output. He uses these to learn the importance of his productions and this process selects for good productions and discards poor ones. In addition, using associative memory, the reformulations build up a correspondence between his output and adult speech sounds. This leads Elija to develop the ability to imitate words spoken by the caregiver by parsing this input, with a DTW recognizer, in terms of previously heard reformulations which he uses as its templates. He thereby identifies the sequence of motor actions he can perform that his caregiver will take to be equivalent to each word. In this way, Elija is able to learn the pronunciation of novel words.

Year: 2011
In session: Poster zu verschiedenen Themenbereichen
Pages: 138 to 145