An empirical study of the Algerian dialect of Social network - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

An empirical study of the Algerian dialect of Social network

Résumé

In this paper, we present analysis on the use of Algerian dialect in Youtube. To do so, we harvested a corpus of 17M of words. This latter was exploited to extract a comparable Algerian corpus, named CALYOU by aligning pairs of sentences written in Latin and Arabic. This one was built by using a multilingual word embeddings approach. Several experiments have been conducted to fix the parameters of the Continuous Bag of Words approach that will be discussed in this article. The method we proposed achieved a performance of 41% in terms of Recall. In the following, we present several figures on the collected data that led to several unexpected results. In fact, 51% of the vocabulary words are written in Latin script and 82% of the total comments are subject to the phenomenon of code-switching.
Fichier principal
Vignette du fichier
ICNLSSP2017_paper_16.pdf (394.01 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01659997 , version 1 (09-12-2017)

Identifiants

  • HAL Id : hal-01659997 , version 1

Citer

Karima Abidi, Kamel Smaïli. An empirical study of the Algerian dialect of Social network. ICNLSSP 2017 - International Conference on Natural Language, Signal and Speech Processing, Dec 2017, Casablanca, Morocco. ⟨hal-01659997⟩
260 Consultations
292 Téléchargements

Partager

Gmail Facebook X LinkedIn More