CountNet: Estimating the Number of Concurrent Speakers Using Supervised Learning

Fabian-Robert Stöter; Soumitro Chakrabarty; Bernd Edler; Emanuël A. P. Habets

doi:10.1109/TASLP.2018.2877892

Article Dans Une Revue IEEE/ACM Transactions on Audio, Speech and Language Processing Année : 2019

CountNet: Estimating the Number of Concurrent Speakers Using Supervised Learning

(1) , (2) , (2) , (2)

1
2

Fabian-Robert Stöter

Fonction : Auteur
PersonId : 741450
IdHAL : fabian-robert-stoter
ORCID : 0000-0002-2534-1165

Scientific Data Management

Soumitro Chakrabarty

Fonction : Auteur

International Audio Laboratories Erlangen

Bernd Edler

Fonction : Auteur

International Audio Laboratories Erlangen

Emanuël A. P. Habets

Fonction : Auteur
PersonId : 938660

International Audio Laboratories Erlangen

Résumé

Estimating the maximum number of concurrent speakers from single-channel mixtures is a challenging problem and an essential first step to address various audio-based tasks such as blind source separation, speaker diarization, and audio surveillance. We propose a unifying probabilistic paradigm, where deep neural network architectures are used to infer output posterior distributions. These probabilities are in turn processed to yield discrete point estimates. Designing such architectures often involves two important and complementary aspects that we investigate and discuss. First, we study how recent advances in deep architectures may be exploited for the task of speaker count estimation. In particular, we show that convolutional recurrent neural networks outperform recurrent networks used in a previous study when adequate input features are used. Even for short segments of speech mixtures, we can estimate up to five speakers, with a significantly lower error than other methods. Second, through comprehensive evaluation, we compare the best-performing method to several baselines, as well as the influence of gain variations, different data sets, and reverberation. The output of our proposed method is compared to human performance. Finally, we give insights into the strategy used by our proposed method.

Mots clés

cocktail-party. overlap detection number of concurrent speakers Speaker count estimation

Domaines

Logique [math.LO]

Fichier principal

stoeter_sourcecount_arxiv.pdf (1.05 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Isabelle Gouat : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-02010805

Soumis le : lundi 2 mars 2020-19:37:14

Dernière modification le : lundi 15 avril 2024-11:18:12

Archivage à long terme le : mercredi 3 juin 2020-16:52:59

Dates et versions

lirmm-02010805 , version 1 (02-03-2020)

Identifiants

HAL Id : lirmm-02010805 , version 1
DOI : 10.1109/TASLP.2018.2877892

Citer

Fabian-Robert Stöter, Soumitro Chakrabarty, Bernd Edler, Emanuël A. P. Habets. CountNet: Estimating the Number of Concurrent Speakers Using Supervised Learning. IEEE/ACM Transactions on Audio, Speech and Language Processing, 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27 (2), pp.268-282. ⟨10.1109/TASLP.2018.2877892⟩. ⟨lirmm-02010805⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA ZENITH LIRMM INRIA2 MIPS UNIV-MONTPELLIER

465 Consultations

1318 Téléchargements

CountNet: Estimating the Number of Concurrent Speakers Using Supervised Learning

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager