A Machine Learning Based Approach for Vocabulary Selection for Speech Transcription - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

A Machine Learning Based Approach for Vocabulary Selection for Speech Transcription

Denis Jouvet

Résumé

This paper introduces a new approach based on neural networks for selecting the vocabulary to be used in a speech transcription system. Indeed, nowadays, large sets of text data can be collected from web sources, and used in addition to more traditional text sources for building language models for speech transcription systems. However, web data sources lead to large amounts of heterogeneous data, and, as a consequence, standard vocabulary selection procedures based on unigram approaches tend to select unwanted and undesirable items as new words. As an alternative to unigram-based and empirical manual-based selection approaches, this paper proposes a new selection procedure that relies on a machine learning technique, namely neural networks. The paper presents and discusses the results obtained with the various selection procedures. The neural network based selection experiments are promising and they can handle automatically various detailed information in the selection process.
Fichier non déposé

Dates et versions

hal-00834302 , version 1 (14-06-2013)

Identifiants

  • HAL Id : hal-00834302 , version 1

Citer

Denis Jouvet, David Langlois. A Machine Learning Based Approach for Vocabulary Selection for Speech Transcription. TSD - 16th International Conference on Text, Speech and Dialogue - 2013, Sep 2013, Pilsen, Czech Republic. pp.60-67. ⟨hal-00834302⟩
143 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More