Speech Recognition Errors in ASR Engines and Their Impact on Linguistic Analysis in Psychotherapies

Abstract:

Modern intervention planning in psychotherapies may benefit from predicting process relevant psychotherapy constructs by automated speech analysis. One essential step is the extraction of relevant linguistic speech markers by ASR engines, which because of highly sensible data, work offline. We analyze transcription errors from NeMo, Whisper, and Wav2Vec2.0, focusing on their impact on linguistic markers usually requiring high quality transcripts. By utilizing part-of-speech tagging, we examine error occurrences among different word types. The Linguistic Inquiry and Word Count (LIWC) software aids in extracting markers. We highlight challenges in transcribing spontaneous speech, prevalent in therapy, and compare results with the Mozilla CommonVoice dataset, which features read speech.


Year: 2024
In session: Poster
Pages: 203 to 210