What you can cram into a single \$&!#* vector: Probing sentence embeddings for linguistic properties

Alexis Conneau; German Kruszewski; Guillaume Lample; Loïc Barrault; Marco Baroni

Communication Dans Un Congrès Année : 2018

What you can cram into a single \$&!#* vector: Probing sentence embeddings for linguistic properties

, (1) , (2) , (3) ,

1
2
3

Alexis Conneau

Fonction : Auteur

German Kruszewski

Fonction : Auteur

Natural Language Processing : representations, inference and semantics

Guillaume Lample

Fonction : Auteur

Machine Learning and Information Access

Loïc Barrault

Fonction : Auteur
PersonId : 15276
IdHAL : loicbarrault
ORCID : 0000-0002-0634-6147
IdRef : 131912488

Laboratoire d'Informatique de l'Université du Mans

Marco Baroni

Fonction : Auteur

Résumé

Although much effort has recently been devoted to training high-quality sentence embeddings, we still have a poor understanding of what they are capturing. "Downstream" tasks, often based on sentence classification, are commonly used to evaluate the quality of sentence representations. The complexity of the tasks makes it however difficult to infer what kind of information is present in the representations. We introduce here 10 probing tasks designed to capture simple linguistic features of sentences, and we use them to study embeddings generated by three different encoders trained in eight distinct ways, uncovering intriguing properties of both encoders and training methods.

Domaines

Informatique et langage [cs.CL]

Loïc BARRAULT : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01898412

Soumis le : jeudi 18 octobre 2018-13:56:16

Dernière modification le : samedi 7 octobre 2023-21:36:22

Dates et versions

hal-01898412 , version 1 (18-10-2018)

Identifiants

HAL Id : hal-01898412 , version 1
ARXIV : 1805.01070

Citer

Alexis Conneau, German Kruszewski, Guillaume Lample, Loïc Barrault, Marco Baroni. What you can cram into a single \$&!#* vector: Probing sentence embeddings for linguistic properties. ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Jul 2018, Melbourne, Australia. pp.2126-2136. ⟨hal-01898412⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LEMANS LIP6 UNIV-LORRAINE LORIA LORIA-NLPKD LIUM LIUM-LST SORBONNE-UNIVERSITE SU-SCIENCES

214 Consultations

0 Téléchargements

What you can cram into a single \$&!#* vector: Probing sentence embeddings for linguistic properties

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager