Structured GMM Based on Unsupervised Clustering for Recognizing Adult and Child Speech - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2014

Structured GMM Based on Unsupervised Clustering for Recognizing Adult and Child Speech

Denis Jouvet

Résumé

Speaker variability is a well-known problem of state-of-the art Automatic Speech Recognition (ASR) systems. In particular, handling children speech is challenging because of substantial differences in pronunciation of the speech units between adult and child speakers. To build accurate ASR systems for all types of speakers Hidden Markov Models with Gaussian Mixture Densities were intensively used in combinationwith model adaptation techniques.This paper compares different ways to improve the recognition of children speech and describes a novel approach relying on Class-StructuredGaussian Mixture Model (GMM). A common solution for reducing the speaker variability relies on gender and age adaptation. First, it is proposed to replace gender and age byunsupervised clustering. Speaker classes are first used for adaptation of the conventional HMM. Second, speaker classes are used for initializing structured GMM, where the components of Gaussian densities are structured with respect to the speaker classes. In a first approach mixture weights of the structured GMM are set dependent on the speaker class. In a second approach the mixture weights are replaced by explicit dependencies between Gaussian components of mixture densities (as in stranded GMMs, but here the GMMs are class-structured).The different approaches are evaluated and compared on the TIDIGITS task. The best improvement is achieved when structured GMM is combined with feature adaptation.
Fichier principal
Vignette du fichier
ago_slsp_2014_v4-juin2014.pdf (280.38 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01090472 , version 1 (03-12-2014)

Identifiants

Citer

Arseniy Gorin, Denis Jouvet. Structured GMM Based on Unsupervised Clustering for Recognizing Adult and Child Speech. SLSP 2014, 2nd International Conference on Statistical Language and Speech Processing, Oct 2014, Grenoble, France. pp.108 - 119, ⟨10.1007/978-3-319-11397-5_8⟩. ⟨hal-01090472⟩
187 Consultations
311 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More