Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding

Benjamin Lecouteux; Georges Linares; Yannick Estève; Guillaume Gravier

Article Dans Une Revue IEEE Transactions on Audio, Speech and Language Processing Année : 2013

Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding

(1) , (2) , (3) , (4)

1
2
3
4

Benjamin Lecouteux

Fonction : Auteur
PersonId : 7847
IdHAL : benjamin-lecouteux
ORCID : 0000-0003-3000-6190
IdRef : 135355060

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Georges Linares

Fonction : Auteur
PersonId : 4977
IdHAL : georges-linares
IdRef : 079368794

Laboratoire Informatique d'Avignon

Yannick Estève

Fonction : Auteur
PersonId : 11645
IdHAL : yannick-esteve
ORCID : 0000-0002-3656-8883
IdRef : 070531668

Laboratoire d'Informatique de l'Université du Maine

Guillaume Gravier

Fonction : Auteur
PersonId : 1046
IdHAL : guig
ORCID : 0000-0002-2266-5682
IdRef : 110355415

Multimedia content-based indexing

Résumé

Combining automatic speech recognition (ASR) systems generally relies on the posterior merging of the outputs or on acoustic cross-adaptation. In this paper, we propose an integrated approach where outputs of secondary systems are integrated in the search algorithm of a primary one. In this driven decoding algorithm (DDA), the secondary systems are viewed as observation sources that should be evaluated and combined to others by a primary search algorithm. DDA is evaluated on a subset of the ESTER I corpus consisting of 4 hours of French radio broadcast news. Results demonstrate DDA significantly outperforms vote-based approaches: we obtain an improvement of 14.5% relative word error rate over the best single-systems, as opposed to the the 6.7% with a ROVER combination. An in-depth analysis of the DDA shows its ability to improve robustness (gains are greater in adverse conditions) and a relatively low dependency on the search algorithm. The application of DDA to both A* and beam-search-based decoder yields similar performances.

Domaines

Multimédia [cs.MM]

Fichier principal

SystemCombination.pdf (501.03 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Guillaume Gravier : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00758626

Soumis le : jeudi 29 novembre 2012-09:32:46

Dernière modification le : jeudi 4 avril 2024-18:19:23

Archivage à long terme le : samedi 17 décembre 2016-17:07:28

Dates et versions

hal-00758626 , version 1 (29-11-2012)

Identifiants

HAL Id : hal-00758626 , version 1

Citer

Benjamin Lecouteux, Georges Linares, Yannick Estève, Guillaume Gravier. Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding. IEEE Transactions on Audio, Speech and Language Processing, 2013. ⟨hal-00758626⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON EC-PARIS UNIV-RENNES1 UGA CNRS INRIA UNIV-LEMANS INSA-RENNES IRISA LIG LIG_TDCGE LIG_TDCGE_GETALP IRISA-D6 INRIA2 UR1-MATH-STIC LIUM LIUM-LST UR1-UFR-ISTIC LIA UNIV-RENNES INSA-GROUPE UR1-MATH-NUM LIG_SIDCH

721 Consultations

546 Téléchargements

Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager