Voice Conversion Based on Spectral Envelope Transformation

Robert Vich; Martin Vondra

Voice Conversion Based on Spectral Envelope Transformation

Abstract:

In this paper a new voice conversion algorithm is presented, which transforms the utterance of a source speaker into the utterance of a target speaker or into the utterance of a new unknown speaker. The voice conversion algorithm is based on spectral speech analysis, frequency transformation, spectral envelope warping, spectrum interpolation and parametrical high quality IIR or FIR cepstral speech synthesis. The cepstral speech model is realized using short time discrete Fourier transform of overlapping pitch-asynchronous speech frames and on speech deconvolution in the cepstral domain. Cepstral speech synthesis is implemented pitch-synchronous and in contrary to the LPC speech model, the cepstral speech model is of the pole/zero type and contains also information about the vocal tract excitation. Several approaches to frequency transformation of the speech spectrum are compared, e.g. linear frequency scaling, piecewise linear frequency warping and nonlinear lowpass to lowpass frequency transformation. The type of spectral warping depends on the wanted accuracy of the formant mapping of the source into the target spectrum. Prosodic transformations, i.e. fundamental frequency, time and intensity scale modifications are also shortly mentioned.

Year: 2004
In session: Sprachsynthese
Pages: 148 to 155