DevOps Manifesto for Speech Corpus Management

Authors: Ingmar Steiner

Abstract:

In this paper, we introduce certain concepts from the DevOps philosophy, and more generally from the software development lifecycle. We argue that the separation between source code and how it is built and released for distribution can be applied to speech corpora as well. We draw a distinction between the developers and maintainers of a speech corpus on one hand, and the researchers who use it on the other. We propose conventions to efficiently manage corpus metadata like source code, and speech data like static assets that can be retrieved automatically. Finally, we mention several use cases which illustrate the merits of these conventions.


Year: 2017
In session: Poster
Pages: 160 to 166