N-Best Rescoring based on Intonation Prediction for a Spanish ASR System

Abstract:

This paper presents a novel method for rescoring the n-best recognition hypotheses using intonation knowledge. The model synthesizes the f0 contours for each of the n-best hypotheses and estimates an intonative matching index between the synthetic shapes and the real f0 contour. This index is applied in the rescoring process, and can be viewed as a degree of intonation compatibility between the hypotheses and the input sentence. The f0 prediction is based on classification and regression trees and the Fujisaki model. We evaluate our approach using a single speaker of the Buenos Aires Spanish LIS-SECYT database under clean and babblenoisy conditions. Considering the systems under no grammar condition, the proposed model reduces the mean absolute word error rate in 3.1% with respect to the baseline system, in a consistent manner and under different noise conditions.


Year: 2010
In session: Speech Recognition
Pages: 234 to 233