Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2015

Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR

Résumé

We evaluate some recent developments in recurrent neural network (RNN) based speech enhancement in the light of noise-robust automatic speech recognition (ASR). The proposed framework is based on Long Short-Term Memory (LSTM) RNNs which are discriminatively trained according to an optimal speech reconstruction objective. We demonstrate that LSTM speech enhancement, even when used ' na¨vely ' as front-end processing, delivers competitive results on the CHiME-2 speech recognition task. Furthermore, simple, feature-level fusion based extensions to the framework are proposed to improve the integration with the ASR back-end. These yield a best result of 13.76 % average word error rate, which is, to our knowledge, the best score to date.
Fichier principal
Vignette du fichier
weninger_LVA15.pdf (295.58 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01163493 , version 1 (13-06-2015)

Identifiants

  • HAL Id : hal-01163493 , version 1

Citer

Felix Weninger, Hakan Erdogan, Shinji Watanabe, Emmanuel Vincent, Jonathan Le Roux, et al.. Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR. 12th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), Aug 2015, Liberec, Czech Republic. ⟨hal-01163493⟩
6116 Consultations
9099 Téléchargements

Partager

Gmail Facebook X LinkedIn More