Speech Processing and Prosody

Denis Jouvet

Communication Dans Un Congrès Année : 2019

Speech Processing and Prosody

(1)

Denis Jouvet

Fonction : Auteur
PersonId : 15904
IdHAL : denis-jouvet
IdRef : 029418666

Speech Modeling for Facilitating Oral-Based Communication

Résumé

The prosody of the speech signal conveys information over the linguistic content of the message: prosody structures the utterance, and also brings information on speaker's attitude and speaker's emotion. Duration of sounds, energy and fundamental frequency are the prosodic features. However their automatic computation and usage are not obvious. Sound duration features are usually extracted from speech recognition results or from a force speech-text alignment. Although the resulting segmentation is usually acceptable on clean native speech data, performance degrades on noisy or not non-native speech. Many algorithms have been developed for computing the fundamental frequency, they lead to rather good performance on clean speech, but again, performance degrades in noisy conditions. However, in some applications, as for example in computer assisted language learning, the relevance of the prosodic features is critical; indeed, the quality of the diagnostic on the learner's pronunciation will heavily depend on the precision and reliability of the estimated prosodic parameters. The paper considers the computation of prosodic features, shows the limitations of automatic approaches, and discusses the problem of computing confidence measures on such features. Then the paper discusses the role of prosodic features and how they can be handled for automatic processing in some tasks such as the detection of discourse particles, the characterization of emotions, the classification of sentence modalities, as well as in computer assisted language learning and in expressive speech synthesis.

Mots clés

Prosody Speech processing Prosodic features Fundamental frequency

Domaines

Traitement du signal et de l'image [eess.SP]

Fichier principal

D.Jouvet--SpeechProsodyAndProcessing-v1.pdf (90.25 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Denis Jouvet : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-02177210

Soumis le : lundi 8 juillet 2019-17:08:25

Dernière modification le : lundi 11 septembre 2023-17:41:19

Dates et versions

hal-02177210 , version 1 (08-07-2019)

Identifiants

HAL Id : hal-02177210 , version 1

Citer

Denis Jouvet. Speech Processing and Prosody. TSD 2019 - 22nd International Conference of Text, Speech and Dialogue, Sep 2019, Ljubljana, Slovenia. ⟨hal-02177210⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD

303 Consultations

1205 Téléchargements

Speech Processing and Prosody

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager