Evaluating chain-of-thought prompting for abstractive dialogue summarization with large language models for German

Abstract:

Dialogue summarization is a key NLP task for capturing conversational nuances while conveying essential information. It has practical applications in doctor-patient interactions, customer service, and multi-speaker meetings, enabling effective review of discussions. However, the lack of dialogue summarization datasets, especially in non-English languages, poses a challenge. This paper explores abstractive summarization using large English datasets, SAMSum and DialogSum, both translated into German. We compared 3-step Chain-of-Thought (CoT) prompting with simple (1-step) prompting across four state-of-the-art Large Language Models (LLMs). Model performance was evaluated using ROUGE and BERTScore metrics. Our findings show CoT prompting outperforms simple prompting for SAMSum for all models used, while further research is needed to validate this approach for DialogSum.


Year: 2025
In session: Poster
Pages: 265 to 272