Human Feature Extraction – The Role of the Articulatory Rhythm

Authors: Harald Höge

Abstract:

Neuro-physical investigations [1] hint to a new paradigm for feature extraction not used in ASR. This paradigm is based on synchronized brain to brain oscillations, active during speech production and speech perception. This mechanism leads to an evolving theory, the author calls the Unified Theory of Human Speech Processing (UTHSP). The core elements of this theory are the articulatory rhythm and the articulatory code. Speech is produced by activating a sequence of articulatory codes. Each code is transformed to an articulatory gesture steered by entrained gamma and theta oscillations called the articulatory rhythm. During each cycle of the rhythm, an articulator gestures is generated. During perception of speech, the articulatory rhythm of the speaker is reconstructed in the brain of the listener. In the cortex, the stream of spectro-temporal features delivered by the midbrain is aligned to phrases, syllables and phones steered by the articulatory rhythm. During each cycle of the rhythm, the aligned spectro-temporal features are integrated and transformed to a bundle of articulatory features. Each bundle generated in a cycle describes a cycle-gesture. In phonetics, each phoneme is described by a phone-gesture. The cycle-gestures seem to have another structure than the phone-gestures. Thus, the relation between the cyclegestures and the related phonetic units is unknown. Human feature extraction is finalized by transforming each bundle of articulatory features to an articulatory code as used in speech production. Based on the UTHSP, an architecture for mimicking the extracting of human features is presented.


Year: 2017
In session: Kognitive Modelle
Pages: 364 to 371