Detecting and counting overlapping speakers in distant speech scenarios - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Detecting and counting overlapping speakers in distant speech scenarios

Résumé

We consider the problem of detecting the activity and counting overlapping speakers in distant-microphone recordings. We treat supervised Voice Activity Detection (VAD), Overlapped Speech Detection (OSD), joint VAD+OSD, and speaker counting as instances of a general Overlapped Speech Detection and Counting (OSDC) task, and we design a Temporal Convolu-tional Network (TCN) based method to address it. We show that TCNs significantly outperform state-of-the-art methods on two real-world distant speech datasets. In particular our best architecture obtains, for OSD, 29.1% and 25.5% absolute improvement in Average Precision over previous techniques on, respectively, the AMI and CHiME-6 datasets. Furthermore, we find that generalization for joint VAD+OSD improves by using a speaker counting objective rather than a VAD+OSD objective. We also study the effectiveness of forced alignment based labeling and data augmentation, and show that both can improve OSD performance.
Fichier principal
Vignette du fichier
cornell_IS20.pdf (255.49 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-02908241 , version 1 (28-07-2020)
hal-02908241 , version 2 (13-10-2021)

Identifiants

  • HAL Id : hal-02908241 , version 2

Citer

Samuele Cornell, Maurizio Omologo, Stefano Squartini, Emmanuel Vincent. Detecting and counting overlapping speakers in distant speech scenarios. INTERSPEECH 2020, Oct 2020, Shanghai, China. ⟨hal-02908241v2⟩
380 Consultations
1317 Téléchargements

Partager

Gmail Facebook X LinkedIn More