Do Sentence Embeddings Capture Discourse Properties of Sentences from Scientific Abstracts ? - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Do Sentence Embeddings Capture Discourse Properties of Sentences from Scientific Abstracts ?

Résumé

We introduce four tasks designed to determine which sentence encoders best capture discourse properties of sentences from scientific abstracts, namely coherence between clauses of a sentence, and discourse relations within sentences. We show that even if contextual en-coders such as BERT or SciBERT encodes the coherence in discourse units, they do not help to predict three discourse relations commonly used in scientific abstracts. We discuss what these results underline, namely that these discourse relations are based on particular phrasing that allow non-contextual encoders to perform well.
Fichier principal
Vignette du fichier
CODI2020.pdf (505.99 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-03000308 , version 1 (11-11-2020)

Identifiants

  • HAL Id : hal-03000308 , version 1

Citer

Laurine Huber, Chaker Memmadi, Mathilde Dargnat, Yannick Toussaint. Do Sentence Embeddings Capture Discourse Properties of Sentences from Scientific Abstracts ?. CODI 2020 - EMNLP 1st Workshop on Computational Approaches to Discourse, Nov 2020, Punta Cana, Dominican Republic. ⟨hal-03000308⟩
107 Consultations
230 Téléchargements

Partager

Gmail Facebook X LinkedIn More