Tackling Language Modelling Bias in Support of Linguistic Diversity

Gábor Bella; Paula Helm; Gertraud Koch; Fausto Giunchiglia

Conference Papers Year : 2024

Tackling Language Modelling Bias in Support of Linguistic Diversity

(1, 2) , (3) , (4) , (5)

1
2
3
4
5

Gábor Bella

Function : Correspondent author
PersonId : 1343031
IdHAL : gabor-bella
ORCID : 0000-0002-3868-1740

Connectez-vous pour contacter l'auteur

Département Logique des Usages, Sciences sociales et Sciences de l'Information

Equipe DECIDE

Paula Helm

Function : Author
PersonId : 1343029
ORCID : 0000-0002-2719-9721

University of Amsterdam [Amsterdam] = Universiteit van Amsterdam

Gertraud Koch

Function : Author

University of Hamburg

Fausto Giunchiglia

Function : Author
PersonId : 1019906

Università degli Studi di Trento = University of Trento

Abstract

Current AI-based language technologies—language models, machine translation systems, multilingual dictionaries and corpora—are known to focus on the world’s 2-3% most widely spoken languages. Research efforts of the past decade have attempted to expand this coverage to ‛under-resourced languages.’ The goal of our paper is to bring attention to a corollary phenomenon that we call language modelling bias: multilingual language processing systems often exhibit a hardwired, yet usually involuntary and hidden representational preference towards certain languages. We define language modelling bias as uneven per-language performance under similar test conditions. We show that bias stems not only from technology but also from ethically problematic research and development methodologies that disregard the needs of language communities. Moving towards diversity-aware alternatives, we present an initiative that aims at reducing language modelling bias within lexical resources through both technology design and methodology, based on an eye-level collaboration with local communities.

Keywords

language modeling bias linguistic diversity low-resource languages natural language processing Value-sensitive design

Domains

Artificial Intelligence [cs.AI] Computation and Language [cs.CL] Ethics

Fichier principal

FAccT_PREPRINT__Tackling_Language_Modelling_Bias_to_Support_Linguistic_Diversity-1.pdf (543.8 Ko)

Origin : Files produced by the author(s)

Gábor Bella : Connect in order to contact the contributor

https://hal.science/hal-04564896

Submitted on : Wednesday, May 1, 2024-12:28:29 AM

Last modification on : Sunday, May 5, 2024-3:15:22 AM

Dates and versions

hal-04564896 , version 1 (01-05-2024)

Licence

Attribution - NonCommercial - NoDerivatives

Identifiers

HAL Id : hal-04564896 , version 1

Cite

Gábor Bella, Paula Helm, Gertraud Koch, Fausto Giunchiglia. Tackling Language Modelling Bias in Support of Linguistic Diversity. FAccT 2024, ACM, Jun 2024, Rio de Janeiro, Brazil. ⟨hal-04564896⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-BREST INSTITUT-TELECOM CNRS LAB-STICC_UBO ETHIQUE ENIB LAB-STICC IMT-ATLANTIQUE LAB-STICC_DECIDE LAB-STICC_DMID

0 View

0 Download

Tackling Language Modelling Bias in Support of Linguistic Diversity

Abstract

Keywords

Domains

Dates and versions

Licence

Identifiers

Cite

Export

Collections

Share