Multichannel Identification and Nonnegative Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function

Xiaofei Li; Sharon Gannot; Laurent Girin; Radu Horaud

doi:10.1109/TASLP.2018.2839362

Article Dans Une Revue IEEE/ACM Transactions on Audio, Speech and Language Processing Année : 2018

Multichannel Identification and Nonnegative Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function

(1) , (2) , (3) , (1)

1
2
3

Xiaofei Li

Fonction : Auteur

Interpretation and Modelling of Images and Videos

Sharon Gannot

Fonction : Auteur

Bar-Ilan University [Israël]

Laurent Girin

Fonction : Auteur
PersonId : 3682
IdHAL : laurent-girin
ORCID : 0000-0002-9214-8760
IdRef : 088998037

GIPSA - Cognitive Robotics, Interactive Systems, & Speech Processing

Radu Horaud

Fonction : Auteur
PersonId : 16183
IdHAL : radu-horaud
ORCID : 0000-0001-5232-024X
IdRef : 032302495

Interpretation and Modelling of Images and Videos

Résumé

This paper addresses the problems of blind multichannel identification and equalization for joint speech dereverberation and noise reduction. The time-domain cross-relation method is hardly applicable for blind room impulse response identification due to the near-common zeros of the long impulse responses. We extend the cross-relation method to the short-time Fourier transform (STFT) domain, in which the time-domain impulse response is approximately represented by the convolutive transfer function (CTF) with much less coefficients. For the oversampled STFT, CTFs suffer from the common zeros caused by the non-flat frequency response of the STFT window. To overcome this, we propose to identify CTFs using the STFT framework with oversampled signals and critically sampled CTFs, which is a good trade-off between the frequency aliasing of the signals and the common zeros problem of CTFs. The identified complex-valued CTFs are not accurate enough for multichannel equalization due to the frequency aliasing of the CTFs. Thence, we only use the CTF magnitudes, which leads to a nonnegative multichannel equalization method based on a nonnegative convolution model between the STFT magnitude of the source signal and the CTF magnitude. Compared with the complex-valued convolution model, this nonnegative convolution model is shown to be more robust against the CTF perturbations. To recover the STFT magnitude of the source signal and to reduce the additive noise, the L2-norm fitting error between the STFT magnitude of the microphone signals and the nonnegative convolution is constrained to be less than a noise power related tolerance. Meanwhile, the L1-norm of the STFT magnitude of the source signal is minimized to impose the sparsity.

Mots clés

convolutive transfer function audio source separation source separation lasso optimization short-time Fourier transform speech enhancement

Domaines

Traitement du signal et de l'image [eess.SP] Apprentissage [cs.LG] Son [cs.SD]

Fichier principal

ctf_dereverberation.pdf (2.02 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Perception team : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01645749

Soumis le : lundi 14 mai 2018-12:03:16

Dernière modification le : jeudi 4 avril 2024-21:01:13

Dates et versions

hal-01645749 , version 1 (23-11-2017)

hal-01645749 , version 2 (26-02-2018)

hal-01645749 , version 3 (14-05-2018)

Identifiants

HAL Id : hal-01645749 , version 3
ARXIV : 1711.07911
DOI : 10.1109/TASLP.2018.2839362

Citer

Xiaofei Li, Sharon Gannot, Laurent Girin, Radu Horaud. Multichannel Identification and Nonnegative Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2018, 26 (10), pp.1755-1768. ⟨10.1109/TASLP.2018.2839362⟩. ⟨hal-01645749v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA IRISA GIPSA GIPSA-DPC LJK LJK_GI LJK_GI_PERCEPTION GIPSA-CRISSP INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

583 Consultations

1387 Téléchargements

Multichannel Identification and Nonnegative Equalization for Dereverberation and Noise Reduction based on Convolutive Transfer Function

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager