First step Towards Enhancing Word Embeddings with Pitch Accent Features for DNN-based Slot Filling on Recognized Text


Slot filling, as a subtask of spoken language understanding, is designed to extract key query terms from text after it has been recognized from speech. Most state-of-the-art models do not, however, take recognition error into account and show a substantial drop in performance when applied to recognized text. One source of information that marks important parts of utterances and is available from speech data is prosody. Since pitch accents have been shown to correlate with semantic slots in the ATIS benchmark corpus, we combine these as features with word embeddings for slot filling on ATIS and compare their impact on the performance of two state-of-the-art models when applied to recognized text. Our experimental results and analysis show that extending word embeddings with pitch accent features slightly improves slot filling systems on recognized text.

Year: 2017
In session: Sprachmodellierung
Pages: 194 to 201