A categorization of robust speech processing datasets

Jonathan Le Roux; Emmanuel Vincent

Rapport (Rapport Technique) Année : 2014

A categorization of robust speech processing datasets

(1) , (2)

1
2

Jonathan Le Roux

Fonction : Auteur

Mitsubishi Electric Research Laboratories

Emmanuel Vincent

Fonction : Auteur
PersonId : 1256
IdHAL : emmanuelv
ORCID : 0000-0002-0183-7289
IdRef : 089360176

Analysis, perception and recognition of speech

Résumé

Speech and audio signal processing research is a tale of data collection efforts and evaluation campaigns. While large datasets for automatic speech recognition (ASR) in clean environments with various speaking styles are available, the landscape is not as picture- perfect when it comes to robust ASR in realistic environments, much less so for evaluation of source separation and speech enhancement methods. Many data collection efforts have been conducted, moving along towards more and more realistic conditions, each mak- ing different compromises between mostly antagonistic factors: financial and human cost; amount of collected data; availability and quality of annotations and ground truth; natural- ness of mixing conditions; naturalness of speech content and speaking style; naturalness of the background noise; etc. In order to better understand what directions need to be explored to build datasets that best support the development and evaluation of algorithms for recognition, separation or localization that can be used in real-world applications, we present here a study of existing datasets in terms of their key attributes.

Domaines

Traitement du signal et de l'image [eess.SP] Traitement du signal et de l'image [eess.SP]

Fichier principal

TR2014-116.pdf (229.61 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Emmanuel Vincent : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01063805

Soumis le : samedi 13 septembre 2014-12:38:57

Dernière modification le : jeudi 1 février 2024-10:05:50

Archivage à long terme le : dimanche 14 décembre 2014-10:23:23

Dates et versions

hal-01063805 , version 1 (13-09-2014)

Identifiants

HAL Id : hal-01063805 , version 1

Citer

Jonathan Le Roux, Emmanuel Vincent. A categorization of robust speech processing datasets. [Technical Report] Mitsubishi Electric Research Labs TR2014-116, 2014. ⟨hal-01063805⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD LARA UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

462 Consultations

646 Téléchargements

A categorization of robust speech processing datasets

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager