Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation

Mehmet Ali Tuğtekin Turan; Emmanuel Vincent; Denis Jouvet

Communication Dans Un Congrès Année : 2020

Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation

(1) , (1) , (1)

Mehmet Ali Tuğtekin Turan

Fonction : Auteur
PersonId : 1075218

Speech Modeling for Facilitating Oral-Based Communication

Emmanuel Vincent

Fonction : Auteur
PersonId : 1256
IdHAL : emmanuelv
ORCID : 0000-0002-0183-7289
IdRef : 089360176

Speech Modeling for Facilitating Oral-Based Communication

Denis Jouvet

Fonction : Auteur
PersonId : 15904
IdHAL : denis-jouvet
IdRef : 029418666

Speech Modeling for Facilitating Oral-Based Communication

Résumé

Current automatic speech recognition (ASR) systems trained on native speech often perform poorly when applied to non-native or accented speech. In this work, we propose to compute x-vector-like accent embeddings and use them as auxiliary inputs to an acoustic model trained on native data only in order to improve the recognition of multi-accent data comprising native, non-native, and accented speech. In addition, we leverage untranscribed accented training data by means of semi-supervised learning. Our experiments show that acoustic models trained with the proposed accent embeddings outperform those trained with conventional i-vector or x-vector speaker embeddings, and achieve a 15% relative word error rate (WER) reduction on non-native and accented speech w.r.t. acoustic models trained with regular spectral features only. Semi-supervised training using just 1 hour of untranscribed speech per accent yields an additional 15% relative WER reduction w.r.t. models trained on native data only.

Mots clés

ASR accent embedding speaker adaptation

Domaines

Informatique et langage [cs.CL] Apprentissage [cs.LG]

Fichier principal

cameraReady_2742.pdf (432.88 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Emmanuel Vincent : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02907929

Soumis le : dimanche 2 août 2020-02:28:22

Dernière modification le : jeudi 1 février 2024-10:04:24

Archivage à long terme le : lundi 30 novembre 2020-12:48:56

Dates et versions

hal-02907929 , version 1 (02-08-2020)

Identifiants

HAL Id : hal-02907929 , version 1

Citer

Mehmet Ali Tuğtekin Turan, Emmanuel Vincent, Denis Jouvet. Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation. INTERSPEECH 2020, Oct 2020, Shanghai, China. ⟨hal-02907929⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 CNRS INRIA IRISA GRID5000 UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES SILECS UR1-MATH-NUM

469 Consultations

1699 Téléchargements

Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager