Multichannel speech separation with recurrent neural networks from high-order ambisonics recordings - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Multichannel speech separation with recurrent neural networks from high-order ambisonics recordings

Résumé

We present a source separation system for high-order ambisonics (HOA) contents. We derive a multichannel spatial filter from a mask estimated by a long short-term memory (LSTM) recurrent neural network. We combine one channel of the mixture with the outputs of basic HOA beamformers as inputs to the LSTM, assuming that we know the directions of arrival of the directional sources. In our experiments, the speech of interest can be corrupted either by diffuse noise or by an equally loud competing speaker. We show that adding as input the output of the beamformer steered toward the competing speech in addition to that of the beamformer steered toward the target speech brings significant improvements in terms of word error rate.
Fichier principal
Vignette du fichier
2018-Perotin-Multichannel_speech_separation_hoa.pdf (347.49 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01699759 , version 1 (05-02-2018)
hal-01699759 , version 2 (30-04-2018)

Identifiants

  • HAL Id : hal-01699759 , version 2

Citer

Lauréline Perotin, Romain Serizel, Emmanuel Vincent, Alexandre Guérin. Multichannel speech separation with recurrent neural networks from high-order ambisonics recordings. 43rd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2018), Apr 2018, Calgary, Canada. ⟨hal-01699759v2⟩
726 Consultations
2629 Téléchargements

Partager

Gmail Facebook X LinkedIn More