Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR

Felix Weninger; Hakan Erdogan; Shinji Watanabe; Emmanuel Vincent; Jonathan Le Roux; John R. Hershey; Björn Schuller

Communication Dans Un Congrès Année : 2015

Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR

(1) , (2, 3) , (2) , (4) , (2) , (2) , (5)

1
2
3
4
5

Felix Weninger

Fonction : Auteur

Technische Universität Munchen - Technical University Munich - Université Technique de Munich

Hakan Erdogan

Fonction : Auteur

Mitsubishi Electric Research Laboratories

Sabanci University [Istanbul]

Shinji Watanabe

Fonction : Auteur

Mitsubishi Electric Research Laboratories

Emmanuel Vincent

Fonction : Auteur
PersonId : 1256
IdHAL : emmanuelv
ORCID : 0000-0002-0183-7289
IdRef : 089360176

Speech Modeling for Facilitating Oral-Based Communication

Jonathan Le Roux

Fonction : Auteur

Mitsubishi Electric Research Laboratories

John R. Hershey

Fonction : Auteur

Mitsubishi Electric Research Laboratories

Björn Schuller

Fonction : Auteur

Imperial College London

Résumé

We evaluate some recent developments in recurrent neural network (RNN) based speech enhancement in the light of noise-robust automatic speech recognition (ASR). The proposed framework is based on Long Short-Term Memory (LSTM) RNNs which are discriminatively trained according to an optimal speech reconstruction objective. We demonstrate that LSTM speech enhancement, even when used ' na¨vely ' as front-end processing, delivers competitive results on the CHiME-2 speech recognition task. Furthermore, simple, feature-level fusion based extensions to the framework are proposed to improve the integration with the ASR back-end. These yield a best result of 13.76 % average word error rate, which is, to our knowledge, the best score to date.

Domaines

Traitement du signal et de l'image [eess.SP]

Fichier principal

weninger_LVA15.pdf (295.58 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Emmanuel Vincent : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01163493

Soumis le : samedi 13 juin 2015-10:46:45

Dernière modification le : jeudi 1 février 2024-10:06:27

Archivage à long terme le : lundi 14 septembre 2015-10:05:47

Dates et versions

hal-01163493 , version 1 (13-06-2015)

Identifiants

HAL Id : hal-01163493 , version 1

Citer

Felix Weninger, Hakan Erdogan, Shinji Watanabe, Emmanuel Vincent, Jonathan Le Roux, et al.. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. 12th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), Aug 2015, Liberec, Czech Republic. ⟨hal-01163493⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA GRID5000 UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES SILECS UR1-MATH-NUM

6116 Consultations

9099 Téléchargements

Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager