Extractive Text-Based Summarization of Arabic videos: Issues, Approaches and Evaluations - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

Extractive Text-Based Summarization of Arabic videos: Issues, Approaches and Evaluations

Résumé

In this paper, we present and evaluate a method for extractive text-based summarization of Arabic videos. The algorithm is proposed in the scope of the AMIS project that aims at helping a user to understand videos given in a foreign language (Arabic). For that, the project proposes several strategies to translate and summarize the videos. One of them consists in transcribing the Ara-bic videos, summarizing the transcriptions, and translating the summary. In this paper we describe the video corpus that was collected from YouTube and present and evaluate the transcription-summarization part of this strategy. Moreover, we present the Automatic Speech Recognition (ASR) system used to transcribe the videos, and show how we adapted this system to the Algerian dialect. Then, we describe how we automatically segment into sentences the sequence of words provided by the ASR system, and how we summarize the obtained sequence of sentences. We evaluate objectively and subjectively our approach. Results show that the ASR system performs well in terms of Word Error Rate on MSA, but needs to be adapted for dealing with Algerian dialect data. The subjective evaluation shows the same behaviour than ASR: transcriptions for videos containing dialectal data were better scored than videos containing only MSA data. However, summaries based on transcriptions are not as well rated, even when transcriptions are better rated. Last, the study shows that features, such as the lengths of transcriptions and summaries, and the subjective score of transcriptions, explain only 31% of the subjective score of summaries.
Fichier principal
Vignette du fichier
ICALP2019AMIS.pdf (142.75 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02314238 , version 1 (11-10-2019)

Identifiants

Citer

M A Menacer, C E González-Gallardo, K Abidi, Dominique Fohr, Denis Jouvet, et al.. Extractive Text-Based Summarization of Arabic videos: Issues, Approaches and Evaluations. ICALP: International Conference on Arabic Language Processing, Oct 2019, Nancy, France. pp.65-78, ⟨10.1007/978-3-030-32959-4_5⟩. ⟨hal-02314238⟩
554 Consultations
360 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More