Programm

Um das Programm als pdf-Datei herunterzuladen, klicken Sie bitte hier. Die Vorstellung der Posterbeiträge findet in der Sitzung am Donnerstag um 15:40 Uhr statt. Die Beiträge aller anderen Sitzungen werden als Vortrag präsentiert.

Mittwoch, 6. März

13:00 Eröffnung

Grußworte des Prodekans der Fakultät Elektrotechnik und Informationstechnik der TU Dresden und der Ausrichter der ESSV

13:20 Festvortrag (Moderation: Peter Birkholz)

Rüdiger Hoffmann

50 years Institute of Acoustics and Speech Communication – 30 years Conference Electronic Speech Signal Processing – 20 years Historic Acoustic-Phonetic Collection

 

14:00 Spracherkennung und -wahrnehmung (Moderation: Bernd Möbius)

Chia Yu Li, Ngoc Thang Vu

Investigation of densely connected convolutional networks with domain adversarial learning for noise robust speech recognition

 

Sabrina Stehwien, Antje Schweitzer, Ngoc Thang Vu

Convolutional neural networks can learn duration for detecting pitch accents and lexical stress

 

Yingmin Gao, Hongwei Ding, Peter Birkholz, Rainer Jäckel, Yi Lin

Perception of German tense and lax vowel contrast by Chinese learners

 

 

15:00 Kaffeepause

 

15:20 Hauptvortrag (Moderation: Simon Stone)

José Andrés González López

Silent speech interfaces for speech restoration: current status and future challenges

 

16:00 Dialogsysteme (Moderation: Felix Burkhardt)

Stefan Hillmann, Klaus-Peter Engelbrecht, Benjamin Weiss

Semi-automatische Generierung und Reinforcement Learning basiertes Training eines Dialogmanagers

 

Eran Raveh, Ingmar Steiner, Ingo Siegert, Iona Gessinger, Bernd Möbius

Comparing phonetic changes in computer-directed and human-directed speech

 

Ivan Kraljevski, Diane Hirschfeld

Analysis and categorization of corrections in multilingual spoken dialogue system

 

 

17:00 Welcome Reception

 

 

19:00 Sitzung des Fördervereins ESSV (Müller’s, Bergstr. 78, 01069 Dresden)

 

 

Donnerstag, 7. März

9:00 Hauptvortrag (Moderation: Ian Howard)

Katharina von Kriegstein

Speech and voice identity recognition in the human brain

 

9:40 Gehirn und kognitive Modelle (Moderation: Ian Howard)

Peter Klimczak, Günther Wirsching, Matthias Wolff

Lernen durch Differenz. Zur logisch-mathematischen Struktur maschinellen Lernens

 

Harald Höge

Extraction of the Ɵ- and ɤ-cycles active in human speech processing from an articulatory speech database

 

Peter beim Graben, Werner Meyer, Ronald Römer, Matthias Wolff

Bidirektionale Utterance-Meaning-Transducer für Zahlworte durch kompositionale minimalistische Grammatiken

 

 

10:40 Kaffeepause

 

11:00 Hauptvortrag (Moderation: Oliver Jokisch)

Korin Richmond

In Articulation for Diversity

 

11:40 Sprachsynthese (Moderation: Oliver Jokisch)

Peter Birkholz, Simon Stone, Steffen Kürbis

Comparison of different methods for the voiced excitation of physical vocal tract models

 

Konstantin Sering, Niels Stehwien, Yingming Gao, Martin V. Butz, Harald Baayen

Resynthesizing the GECO speech corpus with VocalTractLab

 

Felix Burkhardt, Milenko Saponja, Julian Sessner, Benjamin Weiss

How should Pepper sound - Preliminary investigations on robot vocalizations

 

 

12:40 Mittagspause

 

14:00 Hauptvortrag (Moderation: Christiane Neuschaefer-Rube)

Ercan Altinsoy

Sprache von Produktgeräuschen – Mensch-Produkt Interaktion

 

14:40 Medizinische Anwendungen (Moderation: Christiane Neuschaefer-Rube)

Kristian Kroschel, Jürgen Metzler

Influence of speech activity on vibrometer signals to extract vital parameters of humans

 

Till Moritz Eßinger, Martin Koch, Matthias Bornitz, Hannes Seidler, Marcus Neudert, Thomas Zahnert

Schnelle Regelung eines monolithischen vollimplantierbaren Hörgeräts

 

 

15:20 Kaffeepause

 

15:40 Poster und Demonstrationen (Moderation: Klaus Fellbaum)

 

Ingo Siegert, Jannik Nietzold, Ralph Heinemann, Andreas Wendemuth

The restaurant booking corpus – content-identical comparative human-human and human-computer simulated telephone conversations

 

Thilo Michael, Sebastian Möller

ReTiCo: An open-source framework for modeling real-time conversations in spoken dialogue systems

 

Rohan Shet, Elena Davcheva, Christian Uhle

Segmenting multi-intent queries for spoken language understanding

 

Maria Schmidt, Daniela Stier, Steffen Werner, Wolfgang Minker

Exploration and assessment of proactive use cases for an in-car voice assistant

 

Juliane Höbel-Müller, Ingo Siegert, Ralph Heinemann, Alicia Requardt, Michael Tornow, Andreas Wendemuth

Analysis of the influence of different room acoustics on acoustic emotion features

 

Benjamin Weiss, Thilo Michael, Uwe Reichel, Oliver Pauly

Vergleich verschiedener Machine-Learning Ansätze zur kontinuierlichen Schätzung von perzeptivem Sprechtempo

 

Pavel Denisov, Ngoc Thang Vu

IMS-speech: A speech to text tool

 

Christopher Seitz, Mohammed Krini

Schätzung der spektralen Einhüllenden – Ein Vergleich von tiefen neuronalen Netzen

und Codebüchern

 

Ronald Römer, Peter beim Graben, Matthias Wolff

Entscheidungstheoretische Modellierung der konsummatorischen Endhandlung - Vergleich von klassischen und quantenmechanischen Ansätzen

 

Arif Khan, Ingmar Steiner

Multimodal speech segmentation using gaze data and spectrogram image features

 

Ivan Kraljevski, M. Pohl, A. Gjoreski, U. Koloska, J. Wöhl, M. Wenzel, D. Hirschfeld

Design and deployment of multilingual industrial voice control applications

 

Oliver Jokisch, Dominik Fischer

Drone sounds and environmental signals – a first review

 

Falk Gabriel, Patrick Häsner, Eike Dohmen, Dmitry Borin, Peter Birkholz

Surface stickiness and waviness of two-layer silicone structures for synthetic vocal folds

 

Timo Sowa, Soyuj Kumar Sahoo

A toolkit for nested multi-turn speech dialog in automotive environments

 

Susanne Drechsel, Yingming Gao, Jens Frahm, Peter Birkholz

Modell einer Frauenstimme für die artikulatorische Sprachsynthese mit VocalTractLab

 

Hussein Hussein, Burkhard Meyer-Sickendiek, Timo Baumann

How to identify elliptical poems within a digital corpus of auditory poetry

 

Thomas Ranzenberger, Christian Hacker, Karl Weilhammer

Dynamic vocabulary with a Kaldi speech recognizer in a speech dialog system for automotive infotainment applications

 

Mohammad Eslami, Christiane Neuschaefer-Rube, Antoine Serrurier

Automatic vocal tract segmentation based on conditional generative adversarial neural network

 

 

16:40 Abfahrt zur gläsernen Manufaktur


17:30 Beginn der geführten Tour durch die Gläserne Manufaktur (Treffpunkt an der Brücke über den Teich an der Manufaktur in der Lennéstraße)


19:00 Konferenz-Dinner (Torwirtschaft, Lennéstraße 11, 01069 Dresden)

 


Freitag, 8. März

9:00 Hauptvortrag (Moderation: Rüdiger Hoffmann)

Christian Herbst

The myoelastic-aerodynamic theory of sound production in humans, mammals, and birds

 

9:40 Prosodie (Moderation: Rüdiger Hoffmann)

Uwe D. Reichel, Benjamin Weiss, Thilo Michael

Filled pause detection by prosodic discontinuity features

 

Jürgen Trouvain, Malte Belz

Zur Annotation nicht-verbaler Vokalisierungen in Korpora gesprochener Sprache

 

Felix Schaeffler, Matthias Eichner, Janet Beck

Towards ordinal classification of voice quality features with acoustic parameters

 

 

10:40 Kaffeepause

 

11:00 Sprachproduktion (Moderation: Jürgen Trouvain)

Alexander Hewer, Ingmar Steiner, Korin Richmond

Analysis of coarticulation using EMA data with a statistical shape space model of the tongue

 

Ian S. Howard, Peter Birkholz

Modelling vowel acquisition using the Birkholz synthesizer

 

Antoine Serrurier, Pierre Badin, Christiane Neuschaefer-Rube

Influence of the vocal tract morphology on the F1-F2 acoustic plane

 

Mario Fleischer, Alexander Mainka, Dirk Mürbe

Numerische Studie zum Einfluss laryngealer Areale auf individuelle und allgemeine akustische Eigenschaften des menschlichen Vokaltrakts bei gehaltenen Vokalen

 

12:20 Schlusswort

Ausrichter der 31. ESSV

 



Eingeladene Vortragende (in alphabetischer Reihenfolge):

Ercan Altinsoy, TU Dresden, Deutschland

Titel:
Sprache von Produktgeräuschen – Mensch-Produkt Interaktion

Abstract:
Schallwellen sind Träger von Informationen. Entsprechend hoch ist die Bedeutung von Geräuschen für die Menschen. Auf faszinierende Weise schlägt die Akustik eine Brücke zwischen Physik und Wahrnehmung. Schall wird von Schwingungen erzeugt. Schall- und Schwingungsereignisse sind also physikalisch verkoppelt. Daher beginnt die gezielte Gestaltung eines Geräusches zunächst auch mit der physikalischen Beschreibung der verursachenden Schwingungen. Wir sind fähig das Material, die Größe oder die Form einer angeschlagenen Platte durch das Geräusch (bzw. Schwingungen) wahrzunehmen. Verschiedene Signaleigenschaften, z.B. Abklingzeit, Resonanzfrequenzen, der spektrale Schwerpunkt, sind in diesem Beispiel ausschlaggebend. In diesem Beitrag wird der akustische Informationsaustausch zwischen dem Produkt und dem Nutzer aus verschiedenen Perspektiven erläutert. Das System der Bedeutungsbildung und -zuweisung wird diskutiert. Die beispielhaften korrelativen Zusammenhänge zwischen akustischen Ereignissen und Hör- und Bedeutungsereignissen werden dargestellt.

Kurzbiographie:
Ercan Altinsoy Ercan Altinsoy ist Professor für Akustik und Haptik an der TU Dresden. Sein Forschungskonzept kann unter der Überschrift „Entwicklung technischer Geräte unter der Berücksichtigung menschlicher Wahrnehmung“ zusammengefasst werden. Ercan Altinsoy hat Maschinenbau an der Technischen Universität Istanbul studiert. Er promovierte an der Fakultät Elektrotechnik der Ruhr-Universität Bochum. Er nahm gleichzeitig auch an der Internationalen Graduiertenschule für Neurowissenschaften an der Ruhr-Universität Bochum teil. Nach seiner Promotion arbeitete Prof. Altinsoy bei HEAD acoustics als Projektingenieur auf dem Thema NVH (Noise Vibration Harshness). Seit 2006 ist er an der Technischen Universität Dresden. Er hat im Jahr 2014 den hochrangigen Lothar-Cremer-Preis der Deutschen Gesellschaft für Akustik erhalten und letztes Jahr war er als Gastprofessor an der Tohoku University in Japan tätig.



Jose Gonzalez, University of Malaga, Spain

Titel:
Silent speech interfaces for speech restoration: current status and future challenges

Abstract:
Speech production is a rich and complex process, the acoustic signal being just one of the signals stemming from it. In the last few years, the automatic processing of these non-audible speech-related signals has become a research area on its own: silent speech. One of the most exciting applications of silent speech is the possibility of restoring speech to individuals who have lost the ability to communicate orally after a disease or trauma. In this talk I will present an overview of my previous research on silent speech interfaces, where magnetic sensing is used to monitor the movement of the speech articulators and deep learning techniques are used to synthesise speech from the sensor data. In addition to my own research, I will present the current status of silent speech interfaces and discuss common challenges and solutions in this emerging research area.

Short biography:
Jose A. Gonzalez Jose A. Gonzalez is a lecturer in the Department of Languages and Computer Sciences, University of Malaga, Spain. He received the B.Sc. and Ph.D. degrees in Computer Science, both from the University of Granada, Spain, in 2006 and 2013, respectively. During his Ph.D. he did two research visits at the Speech and Hearing Research Group, University of Sheffield, U.K, to study missing data approaches for noise robust speech recognition. From 2013 to 2018 he was a Research Associate at the University of Sheffield working in clinical applications of speech technology. He has co-/authored more than 60 international articles published in books, journals, and proceedings. He has received several scientific awards for his work, including AELFA-IF 2018 best paper award, BioDevices 2018 best paper award, BioSignals 2015 best paper award, EUSIPCO 2014 best student paper award, RTTH best journal paper award and the AVIOS Speech Application Contest 2007/2008 award.



Christian Herbst, Department für Musikwissenschaft, Universität Mozarteum Salzburg

Titel:
The myoelastic-aerodynamic theory of sound production in humans, mammals, and birds

Abstract:
The myoelastic-aerodynamic (MEAD) theory of sound production was proposed over half a century ago, in order to explain how humans produce voice: via flow-induced self-sustained vocal fold oscillation: Once a proper pre-phonatory configuration is created, no further differentiated neural input is required in the larynx – the ensuing vocal fold vibration is a passive physical phenomenon.
The MEAD theory has widespread applications to animal sound production: It applies not only to humans, but also to most non-human mammals (from bats to elephants, including non-human primates), extending across a remarkably large range of fundamental frequencies and body sizes, spanning more than five orders of magnitude. A recent publication provides empirical evidence that the MEAD theory is even relevant for birds, who, unlike humans and non-human mammals, produce sounds with a specialized organ, the syrinx.
In this presentation the MEAD theory is reviewed, considering its overall relevance for human communication in relation to (other) mammals. In particular, the potential for common physiological control mechanisms of voice source characteristics across multiple species is discussed.

Christian Herbst Short biography:
Christian T. Herbst is an Austrian voice scientist. He studied voice pedagogy at Mozarteum University, Salzburg, Austria, and worked for several years as a voice pedagogue. Driven by his interest in the physics and the physiology of voice, he enrolled in a PhD programme in Biophysics at the University of Olomouc, Czech Republic, from which he graduated in 2012. He currently works on the project "Comparative Biomechanics of Mammalian Sound Production", funded by an APART grant from the Austrian Academy of Sciences. The focus of Christian’s scientific work is both on singing voice physiology, and on the physics of voice production in mammals. He received several international scientific awards, and has published, among others, in the prestigious Science journal.





Katharina von Kriegstein, TU Dresden, Germany

Titel:
Speech and voice identity recognition in the human brain

Abstract:
Understanding what is said and recognising the identity of the talker are two important tasks that the brain is faced with in human communication. For a long time neuroscientific models for speech and voice processing have focused mostly on auditory language and voice-sensitive cerebral cortex regions to explain speech and voice recognition. However, our research has shown, that the brain uses even more complex processing strategies for recognising auditory communication signals, such as the recruitment of dedicated visual face areas for auditory processing. In my talk I will give an overview of this work and integrate the findings into a novel view of how the human brain recognises auditory communication signals.

Elisabeth von Kriegstein Short biography:
Katharina von Kriegstein is Professor of Cognitive and Clinical Neuroscience at the Psychology Faculty of the Technische Universität Dresden (TUD). Before joining the TUD, Katharina was Professor at Humboldt University of Berlin and group leader at the Max Planck Institute for Human Cognitive and Brain Sciences in Leipzig (2009-2017). From 2004-2009 she was postdoc at the Wellcome Trust Centre for Neuroimaging, UCL, London, UK. In her research she focusses on understanding the neural mechanisms that enable humans to communicate successfully with each other. This includes neuroscientific research on typically developed populations as well as people with communication deficits such as developmental dyslexia or person identity recognition deficits. Katharina is an internationally leading expert on neuroscience of human communication. Her work has been published in many high-ranking international journals and she received prestigious grants such as an ERC-consolidator grant (2016-2020) and a Max Planck Research Group Grant (2009-2017).
(Photo copyright MPI-CBS)




Korin Richmond, The University of Edinburgh, UK

Titel:
In Articulation for Diversity

Abstract:
Research comes thick and fast these days - flurries of papers and a deluge of results. Amongst the swathes of work we can frequently discern similar, repeating themes though. This is no surprise because researchers, like people more generally, have long tended to converge on common methods and flock to the hot-topics of the day. But this does seem particularly true of work in the most recent past, and especially so in areas related to machine learning such as speech technology. This can be both beneficial, but also problematic. In this talk, I will couch this (somewhat philosophical!) issue within a mix of concrete research examples that span from the use of articulatory data for speech technology to signal processing.

Korin Richmond Short biography:
Korin Richmond is a Reader in Speech Technology at the Centre for Speech Technology Research (CSTR), University of Edinburgh. He has been involved with human language and speech technology since 1991. His PhD (“Estimating Articulatory Parameters from the Acoustic Speech Signal”, awarded 2002), applied probabilistic modelling to the inversion mapping. His research has since broadened to multiple areas, though often with emphasis on exploiting articulation, including: statistical parametric speech synthesis (incl. work culminating in a joint IEEE Best Paper Award 2010); unit selection synthesis (implemented the "MULTISYN" module for the FESTIVAL 2.0 TTS system, which achieved joint third (of seventeen) in the international Blizzard Challenge 2009); and lexicography (jointly produced "COMBILEX", an advanced multi-accent lexicon, licensed by Google, Amazon, Microsoft etc.). Richmond's current work aims to develop ultrasound as a tool for child speech therapy.


Valid HTML 4.01 Strict CSS ist valide!