Collecting and Annotating Natural Child Speech Data – Challenges and Interdisciplinary Perspectives

Abstract:

In this paper we share experiences on collecting and annotating child speech data from our speech language therapy background and the TALC-project (Tools for Analyzing Language and Communication) where we explore the application of machine learning models (focus ASR) for linguistic and speech therapy purposes in an interdisciplinary team. We will reflect on the importance of collecting natural speech data for ASR model training and will summarize recommended methods for eliciting such spontaneous child speech at different ages. For annotating recorded data such as transcribing them and marking relevant parts for subsequent analysis, we will focus on possible ways to ensure communication between different researchers. Throughout, we will elaborate on the interdisciplinary collaboration in our project in order to ensure that requirements of model developers and end-users are met.


Year: 2023
In session: Child Speech
Pages: 72 to 78