Fusion methods for speech enhancement and audio source separation - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Article Dans Une Revue IEEE Transactions on Audio, Speech and Language Processing Année : 2016

Fusion methods for speech enhancement and audio source separation

Méthodes de fusion pour le rehaussement de la parole et la séparation de source audio

Résumé

A wide variety of audio source separation techniques exist and can already tackle many challenging industrial issues. However, in contrast with other application domains, fusion principles were rarely investigated in audio source separation despite their demonstrated potential in classification tasks. In this paper, we propose a general fusion framework which takes advantage of the diversity of existing separation techniques in order to improve separation quality. We obtain new source estimates by summing the individual estimates given by different separation techniques weighted by a set of fusion coefficients. We investigate three alternative fusion methods which are based on standard non-linear optimization, Bayesian model averaging or deep neural networks. Experiments conducted for both speech enhancement and singing voice extraction demonstrate that all the proposed methods outperform traditional model selection. The use of deep neural networks for the estimation of time-varying coefficients notably leads to large quality improvements, up to 3 dB in terms of signal-to-distortion ratio (SDR) compared to model selection.
Fichier principal
Vignette du fichier
taslp16.pdf (877.07 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01120685 , version 1 (04-03-2015)
hal-01120685 , version 2 (20-10-2015)
hal-01120685 , version 3 (20-02-2016)
hal-01120685 , version 4 (09-04-2016)

Identifiants

Citer

Xabier Jaureguiberry, Emmanuel Vincent, Gael Richard. Fusion methods for speech enhancement and audio source separation. IEEE Transactions on Audio, Speech and Language Processing, 2016, ⟨10.1109/TASLP.2016.2553441⟩. ⟨hal-01120685v4⟩
695 Consultations
1364 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More