Automatic User Experience Evaluation of Goal-Oriented Dialogs Using Pre- Trained Language Models

Abstract:

Dialog evaluation methods based on Pre-trained Language Models (Pr-LMs) have been primarily used for open-domain dialogs with the goal of comparing systems in terms of dialog skills relevant in casual chats, such as naturalness, engagement, and relevance. Automatic evaluation metrics for goal-oriented and closed-domain dialogs often measure few and objective metrics like task success rate and ignore subjective aspects of the User Experience (UX). Important subjective usability aspects like satisfaction go beyond simple objective metrics and have traditionally been assessed using questionnaires in an experimental setup. Information about subjective UX is often implicitly contained in the dialog text which could therefore be used to estimate the true UX in an automated fashion using Pr-LMs. This works aims to explore automatic text-based and multifaceted UX evaluation of goal-oriented chatbot interactions using Pr-LMs. We examine both a supervised learning approach and an approach based on an automatic, referencefree and unsupervised dialog evaluation metric. With supervised learning, we train a Pr-LM that predicts several relevant UX aspects with moderate correlation values. SimCSE embeddings perform best and even outperform the UX ratings of human observers collected in a previous study. While the reference-free approach manages to achieve low to moderate correlations, we suspect that this method mainly exploits the correlation between dialog length and user satisfaction and could hence fail in scenarios where these are not correlated.


Year: 2023
In session: Interaction & Dialogue
Pages: 32 to 39