Improving Phoneme Set Discovery for Documenting Unwritten Languages


Many of the 7,000 living languages in the world are currently threatened by extinction. In order to preserve these languages and the cultural heritage linked with them, they need to be documented. This is a challenging and time consuming task, even for trained specialists. Helping linguists in language documentation is the goal of the French-German ANR-DFG project BULB. The first step in documenting a language is the discovery of the phonetic inventory. We aim at assisting linguists during this step by proposing a segmentation of audio data into phonemelike units and by clustering these units using articulatory features. In this work, we refine our existing approach by the use of Deep Bidirectional LSTM networks (DBLSTM), by which we could increase the recognition accuracy for articulatory features.

Year: 2017
In session: Sprachmodellierung
Pages: 202 to 209