Comparison of Training Behaviour and Performance of Reinforcement Learning based Policies for Dialogue Management

Abstract:

We present the results of a laborious comparison of four different reinforcement learning algorithms that are used to train policies for dialogue management. We have trained 32 policies by varying the concept error rate, the number of user dialogue acts, and the number of training dialogues. Data about the training behavior and performance of the trained policy in the evaluation are presented. Actor-critic leads to very good task success rates and notable shorter dialogues among the evaluated algorithms (actor-critic, REINFORCE, Q-Learning, and WoLF-PHC).


Year: 2021
In session: Sprachdialog
Pages: 239 to 246