First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT

Benjamin Muller; Yanai Elazar; Benoît Sagot; Djamé Seddah

Communication Dans Un Congrès Année : 2021

First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT

(1, 2) , (3, 4) , (1) , (1)

1
2
3
4

Benjamin Muller

Fonction : Auteur
PersonId : 1072747
IdRef : 26738016X

Automatic Language Modelling and ANAlysis & Computational Humanities

Sorbonne Université

Yanai Elazar

Fonction : Auteur
PersonId : 1099952

Department of Computer Science [Bar Ilan]

Allen Institute for Artificial Intelligence

Benoît Sagot

Fonction : Auteur
PersonId : 1461
IdHAL : bsagot
ORCID : 0000-0002-0107-8526
IdRef : 177454229

Automatic Language Modelling and ANAlysis & Computational Humanities

Djamé Seddah

Fonction : Auteur
PersonId : 11545
IdHAL : djameseddah
IdRef : 086185136

Automatic Language Modelling and ANAlysis & Computational Humanities

Résumé

Multilingual pretrained language models have demonstrated remarkable zero-shot crosslingual transfer capabilities. Such transfer emerges by fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning. Despite promising results, we still lack a proper understanding of the source of this transfer. Using a novel layer ablation technique and analyses of the model's internal representations, we show that multilingual BERT, a popular multilingual language model, can be viewed as the stacking of two sub-networks: a multilingual encoder followed by a taskspecific language-agnostic predictor. While the encoder is crucial for cross-lingual transfer and remains mostly unchanged during finetuning, the task predictor has little importance on the transfer and can be reinitialized during fine-tuning. We present extensive experiments with three distinct tasks, seventeen typologically diverse languages and multiple domains to support our hypothesis.

Domaines

Informatique et langage [cs.CL]

Fichier principal

2021.eacl-main.189.pdf (766.53 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Benoît Sagot : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03239087

Soumis le : jeudi 27 mai 2021-15:10:46

Dernière modification le : jeudi 1 février 2024-10:04:27

Archivage à long terme le : samedi 28 août 2021-19:06:01

Dates et versions

hal-03239087 , version 1 (27-05-2021)

Identifiants

HAL Id : hal-03239087 , version 1

Citer

Benjamin Muller, Yanai Elazar, Benoît Sagot, Djamé Seddah. First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT. EACL 2021 - The 16th Conference of the European Chapter of the Association for Computational Linguistics, Apr 2021, Kyiv / Virtual, Ukraine. ⟨hal-03239087⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 INRIA IRISA INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES SORBONNE-UNIVERSITE ANR PRAIRIE-IA UR1-MATH-NUM

108 Consultations

153 Téléchargements

First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager