A Recurrent Variational Autoencoder for Speech Enhancement

Simon Leglaive; Xavier Alameda-Pineda; Laurent Girin; Radu Horaud

doi:10.1109/ICASSP40776.2020.9053164

Communication Dans Un Congrès Année : 2020

A Recurrent Variational Autoencoder for Speech Enhancement

(1, 2) , (3) , (4, 3) , (3)

1
2
3
4

Simon Leglaive

Fonction : Auteur
PersonId : 20853
IdHAL : simon-leglaive
ORCID : 0000-0002-8219-1298
IdRef : 25312171X

CentraleSupélec

Institut d'Électronique et des Technologies du numéRique

Xavier Alameda-Pineda

Fonction : Auteur
PersonId : 16186
IdHAL : xavier-alameda-pineda
ORCID : 0000-0002-5354-1084
IdRef : 18450919X

Interpretation and Modelling of Images and Videos

Laurent Girin

Fonction : Auteur
PersonId : 3682
IdHAL : laurent-girin
ORCID : 0000-0002-9214-8760
IdRef : 088998037

GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing

Interpretation and Modelling of Images and Videos

Radu Horaud

Fonction : Auteur
PersonId : 16183
IdHAL : radu-horaud
ORCID : 0000-0001-5232-024X
IdRef : 032302495

Interpretation and Modelling of Images and Videos

Résumé

This paper presents a generative approach to speech enhancement based on a recurrent variational autoencoder (RVAE). The deep generative speech model is trained using clean speech signals only, and it is combined with a nonnegative matrix factorization noise model for speech enhancement. We propose a variational expectation-maximization algorithm where the encoder of the RVAE is finetuned at test time, to approximate the distribution of the latent variables given the noisy speech observations. Compared with previous approaches based on feed-forward fully-connected architectures, the proposed recurrent deep generative speech model induces a posterior temporal dynamic over the latent variables, which is shown to improve the speech enhancement results.

Mots clés

Recurrent variational autoencoders Speech enhancement Nonnegative matrix factorization Variational inference

Domaines

Son [cs.SD] Traitement du signal et de l'image [eess.SP] Réseau de neurones [cs.NE] Intelligence artificielle [cs.AI]

Fichier principal

LAGH_2020.pdf (391.79 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Simon Leglaive : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02329000

Soumis le : vendredi 7 février 2020-16:42:22

Dernière modification le : jeudi 4 avril 2024-18:22:09

Dates et versions

hal-02329000 , version 1 (23-10-2019)

hal-02329000 , version 2 (07-02-2020)

Identifiants

HAL Id : hal-02329000 , version 2
ARXIV : 1910.10942
DOI : 10.1109/ICASSP40776.2020.9053164

Citer

Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud. A Recurrent Variational Autoencoder for Speech Enhancement. ICASSP 2020 - IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, May 2020, Barcelone (virtual), Spain. pp.371-375, ⟨10.1109/ICASSP40776.2020.9053164⟩. ⟨hal-02329000v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-NANTES UNIV-RENNES1 UGA CNRS INRIA INSA-RENNES IRISA GIPSA IETR SUP_IETR LJK LJK_GI LJK_GI_PERCEPTION GIPSA-CRISSP CENTRALESUPELEC IETR-FAST INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE GIPSA-PPC MIAI ANR UR1-MATH-NUM HUB-IA NANTES-UNIVERSITE

452 Consultations

1238 Téléchargements

A Recurrent Variational Autoencoder for Speech Enhancement

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager