Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task? - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task?

Résumé

Cognate prediction is the task of generating, in a given language, the likely cognates of words in a related language, where cognates are words in related languages that have evolved from a common ancestor word. It is a task for which little data exists and which can aid linguists in the discovery of previously undiscovered relations. Previous work has applied machine translation (MT) techniques to this task, based on the tasks' similarities, without, however, studying their numerous differences or optimising architectural choices and hyper-parameters. In this paper, we investigate whether cognate prediction can benefit from insights from low-resource MT. We first compare statistical MT (SMT) and neural MT (NMT) architectures in a bilingual setup. We then study the impact of employing data augmentation techniques commonly seen to give gains in low-resource MT: monolingual pretraining, backtranslation and multilinguality. Our experiments on several Romance languages show that cognate prediction behaves only to a certain extent like a standard lowresource MT task. In particular, MT architectures, both statistical and neural, can be successfully used for the task, but using supplementary monolingual data is not always as beneficial as using additional language data, contrarily to what is observed for MT.
Fichier principal
Vignette du fichier
Is_Cognate_Prediction_a_Low_Resource_Machine_Translation_Task__ACL2021Findings-2.pdf (566.03 Ko) Télécharger le fichier
2021Aug_ACLFindings_Poster.pdf (837.48 Ko) Télécharger le fichier
4a2b9ebbfc0e00f0b33ea5d69cb949f4.pdf (1 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Format : Poster

Dates et versions

hal-03243380 , version 1 (31-05-2021)
hal-03243380 , version 2 (15-12-2022)

Identifiants

  • HAL Id : hal-03243380 , version 1

Citer

Clémentine Fourrier, Rachel Bawden, Benoît Sagot. Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task?. ACL-IJCNLP 2021 - Findings of the Association for Computational Linguistics, Aug 2021, Bangkok, Thailand. ⟨hal-03243380v1⟩
273 Consultations
314 Téléchargements

Partager

Gmail Facebook X LinkedIn More