Optimizing the coverage of a speech database through a selection of representative speaker recordings - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Article Dans Une Revue Speech Communication Année : 2006

Optimizing the coverage of a speech database through a selection of representative speaker recordings

Résumé

In the context of the Neologos French speech database creation project, we have defined a general methodology for the selection of representative speaker recordings. The selection aims at insuring a good coverage in terms of speaker variability while limiting the number of recorded speakers. This makes the resulting database both more adapted to the development of recently proposed multi-model methods and cheaper to collect. The presented methodology proposes to operate a selection by optimizing a quality criterion defined in a variety of speaker similarity modeling frameworks. The selection can be operated and validated with respect to a unique similarity criterion, using classical clustering methods such as Hierarchical or K-Medians clustering, or it can be operated and validated across several speaker similarity criteria, thanks to a newly developed clustering method called Focal Speakers Selection. In this framework, four different speaker similarity criteria are tested, and three different speaker clustering algorithms are compared. Results pertaining to the collection of the Neologos database are also discussed.

Dates et versions

hal-00110509 , version 1 (30-10-2006)

Identifiants

Citer

Sacha Krstulovic, Frédéric Bimbot, Olivier Boëffard, Delphine Charlet, Dominique Fohr, et al.. Optimizing the coverage of a speech database through a selection of representative speaker recordings. Speech Communication, 2006, 48 (10), pp.1319-1348. ⟨10.1016/j.specom.2006.07.002⟩. ⟨hal-00110509⟩
309 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More