Comparing cascaded LSTM architectures for generating head motion from speech in task-oriented dialogs

Duc Canh Nguyen; Gérard Bailly; Frédéric Elisei

Communication Dans Un Congrès Année : 2018

Comparing cascaded LSTM architectures for generating head motion from speech in task-oriented dialogs

(1) , (1) , (2)

1
2

Duc Canh Nguyen

Fonction : Auteur

GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing

Gérard Bailly

Fonction : Auteur
PersonId : 444
IdHAL : gerard-bailly
ORCID : 0000-0002-6053-0818
IdRef : 033792135

GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing

Frédéric Elisei

Fonction : Auteur
PersonId : 17769
IdHAL : frederic-elisei
ORCID : 0000-0002-1295-3445

GIPSA-Services

Résumé

To generate action events for a humanoid robot for human robot interaction (HRI), multimodal interactive behavioral models are typically used given observed actions of the human partner(s). In previous research, we built an interactive model to generate discrete events for gaze and arm gestures, which can be used to drive our iCub humanoid robot [19, 20]. In this paper, we investigate how to generate continuous head motion in the context of a collaborative scenario where head motion contributes to verbal as well as nonverbal functions. We show that in this scenario, the fundamental frequency of speech (F0 feature) is not enough to drive head motion, while the gaze significantly contributes to the head motion generation. We propose a cascaded Long-Short Term Memory (LSTM) model that first estimates the gaze from speech content and hand gestures performed by the partner. This estimation is further used as input for the generation of the head motion. The results show that the proposed method outperforms a single-task model with the same inputs.

Mots clés

Social Robots Head motion generation Human interactions Multi-tasks learning LSTM Human-robot interaction Robot Companions Social Human-Robot Interaction Non-verbal Cues and Expressiveness Multimodal Interaction and Conversational Skills

Domaines

Interface homme-machine [cs.HC] Apprentissage [cs.LG] Traitement du signal et de l'image [eess.SP]

Fichier principal

dcn_HCII2018.pdf (573 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Gérard Bailly : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01848063

Soumis le : mardi 24 juillet 2018-11:43:26

Dernière modification le : jeudi 4 avril 2024-20:51:00

Archivage à long terme le : jeudi 25 octobre 2018-12:49:13

Dates et versions

hal-01848063 , version 1 (24-07-2018)

Identifiants

HAL Id : hal-01848063 , version 1

Citer

Duc Canh Nguyen, Gérard Bailly, Frédéric Elisei. Comparing cascaded LSTM architectures for generating head motion from speech in task-oriented dialogs. HCI 2018 - 20th International Conference on Human-Computer Interaction, Jul 2018, Las Vegas, United States. pp.164-175. ⟨hal-01848063⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS GIPSA GIPSA-DPC PERSYVAL-LAB GIPSA-CRISSP ANR

516 Consultations

347 Téléchargements

Comparing cascaded LSTM architectures for generating head motion from speech in task-oriented dialogs

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager