Setup for Acoustic-Visual Speech Synthesis by Concatenating Bimodal Units - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2010

Setup for Acoustic-Visual Speech Synthesis by Concatenating Bimodal Units

Résumé

This paper presents preliminary work on building a system able to synthesize concurrently the speech signal and a 3D animation of the speaker's face. This is done by concatenating bimodal diphone units, that is, units that comprise both acoustic and visual information. The latter is acquired using a stereovision technique. The proposed method addresses the problems of asyn- chrony and incoherence inherent in classic approaches to au- diovisual synthesis. Unit selection is based on classic target and join costs from acoustic-only synthesis, which are augmented with a visual join cost. Preliminary results indicate the benefits of the approach, since both the synthesized speech signal and the face animation are of good quality. Planned improvements and enhancements to the system are outlined.
Fichier principal
Vignette du fichier
IS10-AT.pdf (602.29 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00526766 , version 1 (15-10-2010)

Identifiants

  • HAL Id : inria-00526766 , version 1

Citer

Asterios Toutios, Utpala Musti, Slim Ouni, Vincent Colotte, Brigitte Wrobel-Dautcourt, et al.. Setup for Acoustic-Visual Speech Synthesis by Concatenating Bimodal Units. Interspeech 2010, ISCA, Sep 2010, Makuhari, Chiba, Japan. pp.486-489. ⟨inria-00526766⟩
247 Consultations
182 Téléchargements

Partager

Gmail Facebook X LinkedIn More