Filled pause detection by prosodic discontinuity features


In this study we aim to predict filled pauses (FP) by analyses of energy, f0, and perceived local speaking rate contours. Prosodic feature profiles areintroduced for FP and non-FP segments as well as for discontinuities at their transitions. Interpretations of those profiles and their discriminatory power will begiven. Based on the extracted prosodic features we trained Random Forest classifiers for FP detection on three different units of classification: manually segmentedsyllables, automatically detected syllables, and equally spaced time stamps. Theadvantages and shortcomings of these units are discussed. Based on prosodic features only, well balanced FP recall and precision values between .82 and .86 wereachieved.

Year: 2019
In session: Prosodie
Pages: 272 to 279