Creating multi-scripts sentiment analysis lexicons for Algerian, Moroccan and Tunisian dialects - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Creating multi-scripts sentiment analysis lexicons for Algerian, Moroccan and Tunisian dialects

Résumé

In this article, we tackle the issue of sentiment analysis in three Maghrebi dialects used in social networks. More precisely, we are interested by analysing sentiments in Algerian, Moroccan and Tunisian corpora. To do this, we built automatically three lexicons of sentiments, one for each dialect. Each lexicon is composed of words with their polarities, a dialect word could be written in Arabic or in Latin scripts. These lexicons may include French or English words as well as words in Arabic dialect and standard Arabic. The semantic orientation of a word represented by an embedding vector is determined automatically by calculating its distance with several embedding seed words. The embedding vectors are trained on three large corpora collected from YouTube. The proposed approach is evaluated by using few existing annotated corpora in Tunisian and Moroccan dialects. For the Algerian dialect, in addition to a small corpus we found in the literature, we collected and annotated one composed of 10k comments extracted from Youtube. This corpus represents a valuable resource which is proposed for free.
Fichier principal
Vignette du fichier
Papiersentiment_analysis_of_Maghrebi.pdf (276.06 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03308111 , version 1 (29-07-2021)

Identifiants

  • HAL Id : hal-03308111 , version 1

Citer

Karima Abidi, Kamel Smaïli. Creating multi-scripts sentiment analysis lexicons for Algerian, Moroccan and Tunisian dialects. 7th International Conference on Data Mining (DTMN 2021) Computer Science Conference Proceedings in Computer Science & Information Technology (CS & IT), Sep 2021, Copenhagen, Denmark. ⟨hal-03308111⟩
84 Consultations
125 Téléchargements

Partager

Gmail Facebook X LinkedIn More