A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders : Supporting Document

Manuel Pariente; Antoine Deleforge; Emmanuel Vincent

Rapport (Rapport De Recherche) Année : 2019

A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders : Supporting Document

(1, 2) , (1, 2) , (1, 2)

1
2

Manuel Pariente

Fonction : Auteur
PersonId : 1045122

Laboratoire Lorrain de Recherche en Informatique et ses Applications

Speech Modeling for Facilitating Oral-Based Communication

Antoine Deleforge

Fonction : Auteur
PersonId : 10056
IdHAL : antoine-deleforge
ORCID : 0000-0003-0339-7472
IdRef : 184451205

Laboratoire Lorrain de Recherche en Informatique et ses Applications

Speech Modeling for Facilitating Oral-Based Communication

Emmanuel Vincent

Fonction : Auteur
PersonId : 1256
IdHAL : emmanuelv
ORCID : 0000-0002-0183-7289
IdRef : 089360176

Laboratoire Lorrain de Recherche en Informatique et ses Applications

Speech Modeling for Facilitating Oral-Based Communication

Résumé

Recent studies have explored the use of deep generative models of speech spectra based of variational autoencoders (VAEs), combined with unsupervised noise models, to perform speech enhancement. These studies developed iterative algorithms involving either Gibbs sampling or gradient descent at each step, making them computationally expensive. This paper proposes a variational inference method to iteratively estimate the power spectrogram of the clean speech. Our main contribution is the analytical derivation of the variational steps in which the encoder of the pre-learned VAE can be used to estimate the variational approximation of the true posterior distribution, using the very same assumption made to train VAEs. Experiments show that the proposed method produces results on par with the aforementioned iterative methods using sampling, while decreasing the computational cost by a factor 36 to reach a given performance.

Mots clés

Speech enhancement variational autoencoders variational Bayes non-negative matrix factorization

Domaines

Son [cs.SD] Machine Learning [stat.ML]

Fichier principal

support_document_final.pdf (1.03 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Manuel Pariente : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02089062

Soumis le : lundi 8 avril 2019-14:06:29

Dernière modification le : jeudi 1 février 2024-10:05:50

Dates et versions

hal-02089062 , version 1 (08-04-2019)

Identifiants

HAL Id : hal-02089062 , version 1

Citer

Manuel Pariente, Antoine Deleforge, Emmanuel Vincent. A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders : Supporting Document. [Research Report] RR-9268, INRIA. 2019, pp.1-8. ⟨hal-02089062⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA INRIA-RRRT UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD LARA UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

259 Consultations

194 Téléchargements

A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders : Supporting Document

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager