An alternative scheme for perplexity estimation and its assessment for the evaluation of language models

Frédéric Bimbot; Marc El Bèze; Stéphane Igounet; Michèle Jardino; Kamel Smaïli; Imed Zitouni

Article Dans Une Revue Computer Speech and Language Année : 2001

An alternative scheme for perplexity estimation and its assessment for the evaluation of language models

(1) , (2) , (2) , (3) , (4) , (4)

1
2
3
4

Frédéric Bimbot

Fonction : Auteur
PersonId : 830967

Institut de Recherche en Informatique et Systèmes Aléatoires

Marc El Bèze

Fonction : Auteur
PersonId : 949557

Laboratoire Informatique d'Avignon

Stéphane Igounet

Fonction : Auteur

Laboratoire Informatique d'Avignon

Michèle Jardino

Fonction : Auteur

Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur

Kamel Smaïli

Fonction : Auteur
PersonId : 2521
IdHAL : kamel-smaili
IdRef : 034429700

Analysis, perception and recognition of speech

Imed Zitouni

Fonction : Auteur

Analysis, perception and recognition of speech

Résumé

Language models are usually evaluated on test texts using the perplexity derived from the likelihood function computed on these texts (test set perplexity). In order to use this measure in the framework of a comparative evaluation campaign, we have developed an alternative scheme for estimating the test set perplexity. The method is derived from the Shannon game and based on a gambling approach on the next word to come in a truncated sentence. We also study the entropy bounds proposed by Shannon and based on the rank of the correct answer, in order to estimate a perplexity interval for non-probabilistic language models. The relevance of the approach is validated on an example. We then report the results of a preliminary comparative evaluation using the proposed scheme

Mots clés

shannon game perplexité perplexity alternative perplexity perplexité alternative jeu de shannon

Domaines

Autre [cs.OH]

Publications Loria : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00100687

Soumis le : mardi 26 septembre 2006-14:49:19

Dernière modification le : samedi 7 octobre 2023-21:36:20

Dates et versions

inria-00100687 , version 1 (26-09-2006)

Identifiants

HAL Id : inria-00100687 , version 1

Citer

Frédéric Bimbot, Marc El Bèze, Stéphane Igounet, Michèle Jardino, Kamel Smaïli, et al.. An alternative scheme for perplexity estimation and its assessment for the evaluation of language models. Computer Speech and Language, 2001, 15 (1), pp.1-13. ⟨inria-00100687⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-AVIGNON EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA LIMSI UNIV-LORRAINE INRIA2 LORIA UR1-MATH-STIC UR1-UFR-ISTIC LIA UNIV-RENNES INSA-GROUPE SORBONNE-UNIVERSITE UR1-MATH-NUM LISN

488 Consultations

0 Téléchargements

An alternative scheme for perplexity estimation and its assessment for the evaluation of language models

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager