Optimizing the coverage of a speech database through a selection of representative speaker recordings

Sacha Krstulovic; Frédéric Bimbot; Olivier Boëffard; Delphine Charlet; Dominique Fohr; Odile Mella

doi:10.1016/j.specom.2006.07.002

Article Dans Une Revue Speech Communication Année : 2006

Optimizing the coverage of a speech database through a selection of representative speaker recordings

(1) , (1) , (2) , (3) , (4) , (4)

1
2
3
4

Sacha Krstulovic

Fonction : Auteur

Speech and sound data modeling and processing

Frédéric Bimbot

Fonction : Auteur
PersonId : 830967

Speech and sound data modeling and processing

Olivier Boëffard

Fonction : Auteur

Human-machine spoken dialogue

Delphine Charlet

Fonction : Auteur

France Télécom Recherche & Développement

Dominique Fohr

Fonction : Auteur
PersonId : 15652
IdHAL : dominique-fohr
IdRef : 031092942

Analysis, perception and recognition of speech

Odile Mella

Fonction : Auteur
PersonId : 15902
IdHAL : odile-mella
IdRef : 12011903X

Analysis, perception and recognition of speech

Résumé

In the context of the Neologos French speech database creation project, we have defined a general methodology for the selection of representative speaker recordings. The selection aims at insuring a good coverage in terms of speaker variability while limiting the number of recorded speakers. This makes the resulting database both more adapted to the development of recently proposed multi-model methods and cheaper to collect. The presented methodology proposes to operate a selection by optimizing a quality criterion defined in a variety of speaker similarity modeling frameworks. The selection can be operated and validated with respect to a unique similarity criterion, using classical clustering methods such as Hierarchical or K-Medians clustering, or it can be operated and validated across several speaker similarity criteria, thanks to a newly developed clustering method called Focal Speakers Selection. In this framework, four different speaker similarity criteria are tested, and three different speaker clustering algorithms are compared. Results pertaining to the collection of the Neologos database are also discussed.

Mots clés

speech database cost minimization speaker selection speaker clustering optimal coverage multi-models speech and speaker recognition

Domaines

Interface homme-machine [cs.HC]

Dominique Fohr : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00110509

Soumis le : lundi 30 octobre 2006-14:03:06

Dernière modification le : vendredi 24 mars 2023-14:52:48

Dates et versions

hal-00110509 , version 1 (30-10-2006)

Identifiants

HAL Id : hal-00110509 , version 1
DOI : 10.1016/j.specom.2006.07.002

Citer

Sacha Krstulovic, Frédéric Bimbot, Olivier Boëffard, Delphine Charlet, Dominique Fohr, et al.. Optimizing the coverage of a speech database through a selection of representative speaker recordings. Speech Communication, 2006, 48 (10), pp.1319-1348. ⟨10.1016/j.specom.2006.07.002⟩. ⟨hal-00110509⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA IRISA-D5 UNIV-LORRAINE INRIA2 LORIA UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE UR1-MATH-NUM

309 Consultations

0 Téléchargements

Optimizing the coverage of a speech database through a selection of representative speaker recordings

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager