Active Learning for Interactive Relation Extraction in a French Newspaper's Articles - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Active Learning for Interactive Relation Extraction in a French Newspaper's Articles

Résumé

Relation extraction is a subtask of natural language processing that has seen many improvements in recent years, with the advent of complex pre-trained architectures. Many of these state-of-the-art approaches are tested against benchmarks with labelled sentences containing tagged entities, and require important pretraining and fine-tuning on task-specific data. However, in a real use-case scenario such as in a newspaper company mostly dedicated to local information, relations are of varied, highly specific type, with virtually no annotated data for such relations, and many entities co-occur in a sentence without being related. We question the use of supervised state-of-the-art models in such a context, where resources such as time, computing power and human annotators are limited. To adapt to these constraints, we experiment with an active-learning based relation extraction pipeline, consisting of a binary LSTM-based lightweight model for detecting the relations that do exist, and a state-of-the-art model for relation classification. We compare several choices for classification models in this scenario, from basic word embedding averaging, to graph neural networks and Bert-based ones, as well as several active learning acquisition strategies, in order to find the most costefficient yet accurate approach in our French largest daily newspaper company's use case.
Fichier principal
Vignette du fichier
ranlp2021.pdf (691.28 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03371917 , version 1 (09-10-2021)

Identifiants

  • HAL Id : hal-03371917 , version 1

Citer

Cyrielle Mallart, Michel Le Nouy, Guillaume Gravier, Pascale Sébillot. Active Learning for Interactive Relation Extraction in a French Newspaper's Articles. RANLP 2021 - Recent Advances in Natural Language Processing, Sep 2021, Online, Bulgaria. pp.886-894. ⟨hal-03371917⟩
104 Consultations
208 Téléchargements

Partager

Gmail Facebook X LinkedIn More