About vocabulary adaptation for automatic speech recognition of video data - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

About vocabulary adaptation for automatic speech recognition of video data

Résumé

This paper discusses the adaptation of vocabularies for automatic speech recognition. The context is the transcriptions of videos in French, English and Arabic. Baseline automatic speech recognition systems have been developed using available data. However, the available text data, including the GigaWord corpora from LDC, are getting quite old with respect to recent videos that are to be transcribed. The paper presents the collection of recent textual data from internet for updating the speech recognition vocabularies and training the language models, as well as the elaboration of development data sets necessary for the vocabulary selection process. The paper also compares the coverage of the training data collected from internet, and of the GigaWord data, with finite size vocabularies made of the most frequent words. Finally, the paper presents and discusses the amount of out-of-vocabulary word occurrences, before and after update of the vocabularies, for the three languages.
Fichier principal
Vignette du fichier
AboutTaskAdaptation-v1.2-upload.01November2017.pdf (1.14 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01649057 , version 1 (27-11-2017)

Identifiants

  • HAL Id : hal-01649057 , version 1

Citer

Denis Jouvet, David Langlois, Mohamed Amine Menacer, Dominique Fohr, Odile Mella, et al.. About vocabulary adaptation for automatic speech recognition of video data. ICNLSSP'2017 - International Conference on Natural Language, Signal and Speech Processing, Dec 2017, Casablanca, Morocco. pp.1-5. ⟨hal-01649057⟩
492 Consultations
379 Téléchargements

Partager

Gmail Facebook X LinkedIn More