Evaluating chain-of-thought prompting for abstractive dialogue summarization with large language models for German
Authors: Neha Deshpande, Stefan Hillmann, Sebastian Möller
Abstract:
Dialogue summarization is a key NLP task for capturing conversational nuances while conveying essential information. It has practical applications in doctor-patient interactions, customer service, and multi-speaker meetings, enabling effective review of discussions. However, the lack of dialogue summarization datasets, especially in non-English languages, poses a challenge. This paper explores abstractive summarization using large English datasets, SAMSum and DialogSum, both translated into German. We compared 3-step Chain-of-Thought (CoT) prompting with simple (1-step) prompting across four state-of-the-art Large Language Models (LLMs). Model performance was evaluated using ROUGE and BERTScore metrics. Our findings show CoT prompting outperforms simple prompting for SAMSum for all models used, while further research is needed to validate this approach for DialogSum.


