Data Augmenting Contrastive Learning of Speech Representations in the Time Domain - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Data Augmenting Contrastive Learning of Speech Representations in the Time Domain

Résumé

Contrastive Predictive Coding (CPC), based on predicting future segments of speech based on past segments is emerging as a powerful algorithm for representation learning of speech signal. However, it still under-performs other methods on unsupervised evaluation benchmarks. Here, we introduce WavAugment, a time-domain data augmentation library and find that applying augmentation in the past is generally more efficient and yields better performances than other methods. We find that a combination of pitch modification, additive noise and reverberation substantially increase the performance of CPC (relative improvement of 18-22%), beating the reference Libri-light results with 600 times less data. Using an out-of-domain dataset, time-domain data augmentation can push CPC to be on par with the state of the art on the Zero Speech Benchmark 2017. We also show that time-domain data augmentation consistently improves downstream limited-supervision phoneme classification tasks by a factor of 12-15% relative.
Fichier principal
Vignette du fichier
2007.00991.pdf (363.03 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03070321 , version 1 (15-12-2020)

Identifiants

Citer

Eugene Kharitonov, Morgane Rivière, Gabriel Synnaeve, Lior Wolf, Pierre-Emmanuel Mazaré, et al.. Data Augmenting Contrastive Learning of Speech Representations in the Time Domain. SLT 2020 - IEEE Spoken Language Technology Workshop, Dec 2020, Shenzhen / Virtual, China. ⟨hal-03070321⟩
106 Consultations
242 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More