Development of automatic Amharic speech recognizer

Yitagessu Birhanu Gebremedhin; Rüdiger Hoffmann

Development of automatic Amharic speech recognizer

Authors: Yitagessu Birhanu Gebremedhin, Rüdiger Hoffmann

Abstract:

Amharic is one of the least researched languages in the world. Particularly, speech and language technologies for this language are almost nonexistent. Off the rack speech corpuses, lexical models and language models are not available and this makes the task of building automatic Amharic speech recognizers very challenging. We present initial results in the development of Amharic speech recognizer. The most important components of the speech recognizer; namely the speech corpus, the lexical model and the language model are developed from scratch. The Amharic speech corpus was collected from people of different age range and gender in such a way that it has all the syllables in an approximately reasonable proportion. A lexical model consisting of hundreds of thousands of words and a Finite-State-Automata based language model are also prepared. The speech recognizer is being developed using the UASR (Unified Approach to Speech Synthesis and Recognition) toolkit of TU Dresden and when it is ready we will integrate it with other modules for further research work, particularly in the development of Amharic speech to Ethiopian Sign Language (ESL) converter.

Year: 2011
In session: Poster zu verschiedenen Themenbereichen
Pages: 118 to 122