LibriMix: An open-source dataset for generalizable speech separation

Joris Cosentino; Manuel Pariente; Samuele Cornell; Antoine Deleforge; Emmanuel Vincent

Pré-Publication, Document De Travail Année : 2020

LibriMix: An open-source dataset for generalizable speech separation

(1) , (1) , (2) , (1) , (1)

1
2

Joris Cosentino

Fonction : Auteur
PersonId : 1111325

Speech Modeling for Facilitating Oral-Based Communication

Manuel Pariente

Fonction : Auteur
PersonId : 1045122

Speech Modeling for Facilitating Oral-Based Communication

Samuele Cornell

Fonction : Auteur
PersonId : 1111326

Polytechnic University of Marche [Ancona, Italy] / Università Politecnica delle Marche [Ancona, Italia]

Antoine Deleforge

Fonction : Auteur
PersonId : 10056
IdHAL : antoine-deleforge
ORCID : 0000-0003-0339-7472
IdRef : 184451205

Speech Modeling for Facilitating Oral-Based Communication

Emmanuel Vincent

Fonction : Auteur
PersonId : 1256
IdHAL : emmanuelv
ORCID : 0000-0002-0183-7289
IdRef : 089360176

Speech Modeling for Facilitating Oral-Based Communication

Résumé

In recent years, wsj0-2mix has become the reference dataset for single-channel speech separation. Most deep learning-based speech separation models today are benchmarked on it. However, recent studies have shown important performance drops when models trained on wsj0-2mix are evaluated on other, similar datasets. To address this generalization issue, we created LibriMix, an open-source alternative to wsj0-2mix, and to its noisy extension, WHAM!. Based on LibriSpeech, LibriMix consists of two-or three-speaker mixtures combined with ambient noise samples from WHAM!. Using Conv-TasNet, we achieve competitive performance on all LibriMix versions. In order to fairly evaluate across datasets, we introduce a third test set based on VCTK for speech and WHAM! for noise. Our experiments show that the generalization error is smaller for models trained with LibriMix than with WHAM!, in both clean and noisy conditions. Aiming towards evaluation in more realistic, conversation-like scenarios, we also release a sparsely overlapping version of LibriMix's test set.

Mots clés

speech separation generalization corpora

Domaines

Traitement du signal et de l'image [eess.SP]

Fichier principal

cosentino2020.pdf (170.55 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Emmanuel Vincent : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03354695

Soumis le : samedi 25 septembre 2021-19:28:20

Dernière modification le : vendredi 23 février 2024-16:18:05

Dates et versions

hal-03354695 , version 1 (25-09-2021)

Identifiants

HAL Id : hal-03354695 , version 1
ARXIV : 2005.11262

Citer

Joris Cosentino, Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent. LibriMix: An open-source dataset for generalizable speech separation. 2020. ⟨hal-03354695⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA GRID5000 UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES SILECS UR1-MATH-NUM

148 Consultations

988 Téléchargements

LibriMix: An open-source dataset for generalizable speech separation

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager