Do self-supervised speech models develop human-like perception biases?

Juliette Millet; Ewan Dunbar

doi:10.18653/v1/2022.acl-long.523

Communication Dans Un Congrès Année : 2022

Do self-supervised speech models develop human-like perception biases?

(1, 2) , (1)

1
2

Juliette Millet

Fonction : Auteur
PersonId : 1053078

Apprentissage machine et développement cognitif

Laboratoire de Linguistique Formelle

Ewan Dunbar

Fonction : Auteur
PersonId : 1078898

Apprentissage machine et développement cognitif

Résumé

Self-supervised models for speech processing form representational spaces without using any external labels. Increasingly, they appear to be a feasible way of at least partially eliminating costly manual annotations, a problem of particular concern for low-resource languages. But what kind of representational spaces do these models construct? Human perception specializes to the sounds of listeners' native languages. Does the same thing happen in self-supervised models? We examine the representational spaces of three kinds of stateof-the-art self-supervised models: wav2vec 2.0, HuBERT and contrastive predictive coding (CPC), and compare them with the perceptual spaces of French-speaking and Englishspeaking human listeners, both globally and taking account of the behavioural differences between the two language groups. We show that the CPC model shows a small native language effect, but that wav2vec 2.0 and Hu-BERT seem to develop a universal speech perception space which is not language specific. A comparison against the predictions of supervised phone recognisers suggests that all three self-supervised models capture relatively finegrained perceptual phenomena, while supervised models are better at capturing coarser, phone-level, effects of listeners' native language, on perception.

Domaines

Linguistique Intelligence artificielle [cs.AI]

Fichier principal

_ACL__Do_self_supervised_speech_models_develop_human_like_perception_biases__.pdf (1.55 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Ewan Dunbar : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03697420

Soumis le : mercredi 22 juin 2022-15:39:22

Dernière modification le : vendredi 19 avril 2024-16:18:59

Archivage à long terme le : vendredi 23 septembre 2022-18:04:01

Dates et versions

hal-03697420 , version 1 (22-06-2022)

Identifiants

HAL Id : hal-03697420 , version 1
DOI : 10.18653/v1/2022.acl-long.523

Citer

Juliette Millet, Ewan Dunbar. Do self-supervised speech models develop human-like perception biases?. ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, May 2022, Dublin, Ireland. pp.7591-7605, ⟨10.18653/v1/2022.acl-long.523⟩. ⟨hal-03697420⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA EHESS LLF LSCP DEC INRIA2 GENCI CAMPUS-AAR AAI PSL UP-SOCIETES-HUMANITES ANR PRAIRIE-IA INTERACTIFS

44 Consultations

22 Téléchargements

Do self-supervised speech models develop human-like perception biases?

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager