Arabic Statistical N-gram Models - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Article Dans Une Revue International Review on Computers and Software (IRECOS) Année : 2009

Arabic Statistical N-gram Models

Karima Meftouh
  • Fonction : Auteur
  • PersonId : 857254
Kamel Smaïli
Mohamed Tayeb Laskri
  • Fonction : Auteur
  • PersonId : 857255

Résumé

In this work we propose to investigate statistical language models for Arabic. Several experiments using different smoothing techniques have been carried out on a small corpus extracted from a daily newspaper. The sparseness data conducts us to investigate other solutions without increasing the size of the corpus. A word segmentation has been operated in order to increase the statistical viability of the corpus. This leads to a better performance in terms of normalized perplexity
karima2_IRECOSprprint.pdf (484.94 Ko) Télécharger le fichier

Dates et versions

hal-01639807 , version 1 (20-11-2017)

Identifiants

  • HAL Id : hal-01639807 , version 1

Citer

Karima Meftouh, Kamel Smaïli, Mohamed Tayeb Laskri. Arabic Statistical N-gram Models. International Review on Computers and Software (IRECOS), 2009, 4 (1). ⟨hal-01639807⟩
126 Consultations
33 Téléchargements

Partager

Gmail Facebook X LinkedIn More