How to handle gender and number agreement in statistical language models? - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2006

How to handle gender and number agreement in statistical language models?

Résumé

The agreement in gender and number is a critical problem in statistical language modeling. One of the main difficulties in speech recognition of French language is the presence of misrecognized words due to the bad agreement (in gender and number) between words. Statistical language models do not treat this phenomena directly. This paper focuses on how to handle the issue of this agreement. We introduce an original model called Features-Cache (FC) to estimate the gender and the number of the word to predict. It is a dynamic variable-length Features-Cache. The size of the cache is automatically determined in accordance to syntagm delimitors. The main advantage of this model is that there is no need to any syntactic parsing: it is used as any other statistical language model. Several models have been carried out and the best one achieves an improvement of approximatively 9 points in terms of perplexity. This model has been integrated in a speech recognition system based on JULIUS engine. Tests have been carried out on 280 sentences provided by AUPELF for the French automatic speech recognition evaluation campaign. This new model outperforms the baseline one, in terms of word error, by 3%.
Fichier principal
Vignette du fichier
ICSLP06.pdf (72 Ko) Télécharger le fichier
Loading...

Dates et versions

inria-00103497 , version 1 (04-10-2006)

Identifiants

  • HAL Id : inria-00103497 , version 1

Citer

Caroline Lavecchia, Kamel Smaïli, Jean-Paul Haton. How to handle gender and number agreement in statistical language models?. Ninth International Conference on Spoken Language Processing - INTERSPEECH 2006, Sep 2006, Pittsburgh, Pennsylvania/USA. ⟨inria-00103497⟩
176 Consultations
156 Téléchargements

Partager

Gmail Facebook X LinkedIn More