Comparing cascaded LSTM architectures for generating head motion from speech in task-oriented dialogs - [Labex] PERSYVAL-lab Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Comparing cascaded LSTM architectures for generating head motion from speech in task-oriented dialogs

Résumé

To generate action events for a humanoid robot for human robot interaction (HRI), multimodal interactive behavioral models are typically used given observed actions of the human partner(s). In previous research, we built an interactive model to generate discrete events for gaze and arm gestures, which can be used to drive our iCub humanoid robot [19, 20]. In this paper, we investigate how to generate continuous head motion in the context of a collaborative scenario where head motion contributes to verbal as well as nonverbal functions. We show that in this scenario, the fundamental frequency of speech (F0 feature) is not enough to drive head motion, while the gaze significantly contributes to the head motion generation. We propose a cascaded Long-Short Term Memory (LSTM) model that first estimates the gaze from speech content and hand gestures performed by the partner. This estimation is further used as input for the generation of the head motion. The results show that the proposed method outperforms a single-task model with the same inputs.
Fichier principal
Vignette du fichier
dcn_HCII2018.pdf (573 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01848063 , version 1 (24-07-2018)

Identifiants

  • HAL Id : hal-01848063 , version 1

Citer

Duc Canh Nguyen, Gérard Bailly, Frédéric Elisei. Comparing cascaded LSTM architectures for generating head motion from speech in task-oriented dialogs. HCI 2018 - 20th International Conference on Human-Computer Interaction, Jul 2018, Las Vegas, United States. pp.164-175. ⟨hal-01848063⟩
516 Consultations
347 Téléchargements

Partager

Gmail Facebook X LinkedIn More