Enhancement of esophageal speech using statistical and neuromimetic voice conversion techniques

Imen Ben Othmane; Joseph Di Martino; Kais Ouni

Article Dans Une Revue Journal of International Science and General Applications Année : 2018

Enhancement of esophageal speech using statistical and neuromimetic voice conversion techniques

(1, 2) , (2) , (1)

1
2

Imen Ben Othmane

Fonction : Auteur

Unité de Recherche Systèmes Mécatroniques et Signaux

Statistical Machine Translation and Speech Modelization and Text

Joseph Di Martino

Fonction : Auteur
PersonId : 16557
IdHAL : joseph-di-martino
IdRef : 179331531

Statistical Machine Translation and Speech Modelization and Text

Kais Ouni

Fonction : Auteur

Unité de Recherche Systèmes Mécatroniques et Signaux

Résumé

This paper presents a novel approach for enhancing esophageal speech using voice conversion techniques. Esophageal speech (ES) is an alternative voice that allows a patient with no vocal cords to produce sounds after total laryngectomy: although it doesn't need any external devices, this voice sounds unnatural when compared to laryngeal speech. ES is frequently described as a harsh speech with low pitch frequency and loudness. Consequently, ES has a poor degree of intelligibility and a poor quality. To improve naturalness and intelligibility of esophageal speech, we propose a speaking-aid system enhancing ES in order to clarify and make it more natural. Given the specificity of ES, in this study, we propose to apply a new voice conversion technique taking into account the particularity of the pathological vocal apparatus. The vocal tract and excitation cepstral coefficients are separately estimated. We trained deep neural networks (DNNs) and Gaussian mixture models (GMMs) to predict "laryngeal" vocal tract features from esophageal speech. The converted cepstral vectors are then used to estimate excitation and phase coefficients by a search in the target training space previously encoded as a binary tree. The voice resynthesized sounds like a laryngeal voice, i.e., is more natural than the original ES, with an effective reconstruction of the prosodic information while retaining, and this is the highlight of our study, the characteristics of the vocal tract inherent to the source speaker. The results of voice conversion evaluated using objective and subjective experiments, validate the proposed approach.

Domaines

Traitement du signal et de l'image [eess.SP]

Joseph Di Martino : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01724375

Soumis le : mardi 6 mars 2018-14:34:38

Dernière modification le : mercredi 13 septembre 2023-11:08:04

Dates et versions

hal-01724375 , version 1 (06-03-2018)

Identifiants

HAL Id : hal-01724375 , version 1

Citer

Imen Ben Othmane, Joseph Di Martino, Kais Ouni. Enhancement of esophageal speech using statistical and neuromimetic voice conversion techniques. Journal of International Science and General Applications, 2018, 1 (1), pp.10. ⟨hal-01724375⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE LORIA LORIA-NLPKD

392 Consultations

0 Téléchargements

Enhancement of esophageal speech using statistical and neuromimetic voice conversion techniques

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager