Towards unsupervised learning of speech features in the wild

Morgane Rivière; Emmanuel Dupoux

Communication Dans Un Congrès Année : 2020

Towards unsupervised learning of speech features in the wild

(1) , (2, 3)

1
2
3

Morgane Rivière

Fonction : Auteur

Facebook AI Research [Paris]

Emmanuel Dupoux

Fonction : Auteur
PersonId : 757939
ORCID : 0000-0002-7814-2952

Laboratoire de sciences cognitives et psycholinguistique

Apprentissage machine et développement cognitif

Résumé

Recent work on unsupervised contrastive learning of speech representation has shown promising results, but so far has mostly been applied to clean, curated speech datasets. Can it also be used with unprepared audio data "in the wild"? Here, we explore three potential problems in this setting: (i) presence of non-speech data, (ii) noisy or low quality speech data, and (iii) imbalance in speaker distribution. We show that on the Libri-light train set, which is itself a relatively clean speech-only dataset, these problems combined can already have a performance cost of up to 30% relative for the ABX score. We show that the first two problems can be alleviated by data filtering, with voice activity detection selecting speech segments, while perplexity of a model trained with clean data helping to discard entire files. We show that the third problem can be alleviated by learning a speaker embedding in the predictive branch of the model. We show that these techniques build more robust speech features that can be transferred to an ASR task in the low resource setting.

Mots clés

Speech recognition Unsupervised representation learning Contrastive predictive coding Data filtering Speaker adaptation

Domaines

Informatique [cs] Intelligence artificielle [cs.AI]

Fichier principal

Riviere_D_2020_Towards_CPC_in_the_wild.SLT.pdf (214.32 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Emmanuel Dupoux : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03070411

Soumis le : mardi 15 décembre 2020-18:41:00

Dernière modification le : lundi 18 mars 2024-10:24:06

Archivage à long terme le : mardi 16 mars 2021-20:19:21

Dates et versions

hal-03070411 , version 1 (15-12-2020)

Identifiants

HAL Id : hal-03070411 , version 1

Citer

Morgane Rivière, Emmanuel Dupoux. Towards unsupervised learning of speech features in the wild. SLT 2020 : IEEE Spoken Language Technology Workshop, Dec 2020, Shenzhen / Virtual, China. ⟨hal-03070411⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA EHESS LSCP DEC INRIA2 PSL ANR PRAIRIE-IA

86 Consultations

719 Téléchargements

Towards unsupervised learning of speech features in the wild

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager