Upcoming new ITU-T recommendation on the evaluation of text-based chatbots


The evaluation of spoken dialog systems has been an object of scientific research for decades. Whereas standardized methods were made available by the International Telecommunication Union (ITU-T), a comparable level of maturity is still missing for the evaluation of text-based chatbots. This contribution presents ongoing work developing a new ITU-T Recommendation describing subjective evaluation methods to quantify the quality of services relying on text-based chatbots, as experienced by the users of such services. Chatbots addressed by the upcoming Recommendation enable a text-based natural language interaction with a human user via a text interface on a turn-by-turn basis. They possess natural language understanding, dialogue management, and natural language generation capabilities. The evaluation methods address different aspects of quality from a user’s point of view, taking the chatbot as a black box. They are based on laboratory or remote experiments in which participants interact with the chatbot in order to perform a pre-defined, realistic task. The participant’s opinion on perceptive quality dimensions is solicited with the help of questionnaires, and examples of such questionnaires are provided.

Year: 2022
In session: Voice Assistants & Speech Dialogue Systems
Pages: 97 to 104