First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT

Benjamin Muller; Yanai Elazar; Benoît Sagot; Djamé Seddah

Pré-Publication, Document De Travail Année : 2021

First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT

(1) , (2) , (1) , (1)

1
2

Benjamin Muller

Fonction : Auteur

Automatic Language Modelling and ANAlysis & Computational Humanities

Yanai Elazar

Fonction : Auteur

Department of Computer Science [Bar Ilan]

Benoît Sagot

Fonction : Auteur
PersonId : 1461
IdHAL : bsagot
ORCID : 0000-0002-0107-8526
IdRef : 177454229

Automatic Language Modelling and ANAlysis & Computational Humanities

Djamé Seddah

Fonction : Auteur
PersonId : 11545
IdHAL : djameseddah
IdRef : 086185136

Automatic Language Modelling and ANAlysis & Computational Humanities

Résumé

Multilingual pretrained language models have demonstrated remarkable zero-shot cross-lingual transfer capabilities. Such transfer emerges by fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning. Despite promising results, we still lack a proper understanding of the source of this transfer. Using a novel layer ablation technique and analyses of the model's internal representations, we show that multilingual BERT, a popular multilingual language model, can be viewed as the stacking of two sub-networks: a multilingual encoder followed by a task-specific language-agnostic predictor. While the encoder is crucial for cross-lingual transfer and remains mostly unchanged during fine-tuning, the task predictor has little importance on the transfer and can be reinitialized during fine-tuning. We present extensive experiments with three distinct tasks, seventeen typologically diverse languages and multiple domains to support our hypothesis.

Domaines

Traitement du texte et du document

Djamé Seddah : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03161685

Soumis le : lundi 8 mars 2021-00:20:52

Dernière modification le : jeudi 1 février 2024-10:05:18

Dates et versions

hal-03161685 , version 1 (08-03-2021)

Identifiants

HAL Id : hal-03161685 , version 1
ARXIV : 2101.11109

Citer

Benjamin Muller, Yanai Elazar, Benoît Sagot, Djamé Seddah. First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT. 2021. ⟨hal-03161685⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 INRIA IRISA INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES ANR PRAIRIE-IA UR1-MATH-NUM

38 Consultations

0 Téléchargements

First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager