Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

Eugene Kharitonov; Morgane Rivière; Gabriel Synnaeve; Lior Wolf; Pierre-Emmanuel Mazaré; Matthijs Douze; Emmanuel Dupoux

Communication Dans Un Congrès Année : 2020

Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

(1) , (1) , (1) , (1) , (1) , (1) , (2, 3)

1
2
3

Eugene Kharitonov

Fonction : Auteur
PersonId : 1053020

Facebook AI Research [Paris]

Morgane Rivière

Fonction : Auteur

Facebook AI Research [Paris]

Gabriel Synnaeve

Fonction : Auteur

Facebook AI Research [Paris]

Lior Wolf

Fonction : Auteur
PersonId : 1086117

Facebook AI Research [Paris]

Pierre-Emmanuel Mazaré

Fonction : Auteur

Facebook AI Research [Paris]

Matthijs Douze

Fonction : Auteur
PersonId : 1086118

Facebook AI Research [Paris]

Emmanuel Dupoux

Fonction : Auteur
PersonId : 757939
ORCID : 0000-0002-7814-2952

Laboratoire de sciences cognitives et psycholinguistique

Apprentissage machine et développement cognitif

Résumé

Contrastive Predictive Coding (CPC), based on predicting future segments of speech based on past segments is emerging as a powerful algorithm for representation learning of speech signal. However, it still under-performs other methods on unsupervised evaluation benchmarks. Here, we introduce WavAugment, a time-domain data augmentation library and find that applying augmentation in the past is generally more efficient and yields better performances than other methods. We find that a combination of pitch modification, additive noise and reverberation substantially increase the performance of CPC (relative improvement of 18-22%), beating the reference Libri-light results with 600 times less data. Using an out-of-domain dataset, time-domain data augmentation can push CPC to be on par with the state of the art on the Zero Speech Benchmark 2017. We also show that time-domain data augmentation consistently improves downstream limited-supervision phoneme classification tasks by a factor of 12-15% relative.

Mots clés

Speech recognition Unsupervised representation learning Contrastive predictive coding Data augmentation

Domaines

Informatique et langage [cs.CL] Son [cs.SD]

Fichier principal

2007.00991.pdf (363.03 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Emmanuel Dupoux : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03070321

Soumis le : mardi 15 décembre 2020-17:57:53

Dernière modification le : vendredi 19 avril 2024-16:18:55

Archivage à long terme le : mardi 16 mars 2021-20:15:31

Dates et versions

hal-03070321 , version 1 (15-12-2020)

Identifiants

HAL Id : hal-03070321 , version 1
ARXIV : 2007.00991

Citer

Eugene Kharitonov, Morgane Rivière, Gabriel Synnaeve, Lior Wolf, Pierre-Emmanuel Mazaré, et al.. Data Augmenting Contrastive Learning of Speech Representations in the Time Domain. SLT 2020 - IEEE Spoken Language Technology Workshop, Dec 2020, Shenzhen / Virtual, China. ⟨hal-03070321⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA EHESS LSCP DEC INRIA2 PSL ANR PRAIRIE-IA

106 Consultations

242 Téléchargements

Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager