Exploring Conditional Language Model Based Data Augmentation Approaches For Hate Speech Classification - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Exploring Conditional Language Model Based Data Augmentation Approaches For Hate Speech Classification

Résumé

Deep Neural Network (DNN) based classifiers have gained increased attention in hate speech classification. However, the performance of DNN classifiers increases with quantity of available training data and in reality, hate speech datasets consist of only a small amount of labeled data. To counter this, Data Augmentation (DA) techniques are often used to increase the number of labeled samples and therefore, improve the classifier's performance. In this article, we explore augmentation of training samples using a conditional language model. Our approach uses a single class conditioned Generative Pre-Trained Transformer-2 (GPT-2) language model for DA, avoiding the need for multiple class specific GPT-2 models. We study the effect of increasing the quantity of the augmented data and show that adding a few hundred samples significantly improves the classifier's performance. Furthermore, we evaluate the effect of filtering the generated data used for DA. Our approach demonstrates up to 7.3% and up to 25.0% of relative improvements in macro-averaged F1 on two widely used hate speech corpora.
Fichier principal
Vignette du fichier
Article_on_DA_for_HAL.pdf (305.8 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03244472 , version 1 (01-06-2021)

Identifiants

  • HAL Id : hal-03244472 , version 1

Citer

Ashwin Geet d'Sa, Irina Illina, Dominique Fohr, Dietrich Klakow, Dana Ruiter. Exploring Conditional Language Model Based Data Augmentation Approaches For Hate Speech Classification. TSD 2021 - 24th International Conference on Text, Speech and Dialogue, Sep 2021, Olomouc, Czech Republic. ⟨hal-03244472⟩
132 Consultations
313 Téléchargements

Partager

Gmail Facebook X LinkedIn More