Multi-stage attention for fine-grained expressivity transfer in multispeaker text-to-speech system - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

Multi-stage attention for fine-grained expressivity transfer in multispeaker text-to-speech system

Résumé

The main goal of this work is to provide fine-grained transfer of expressivity in various speaker's voices for which no expressive speech data is available. Our approach conditions a multispeaker Tacotron 2 system with latent embeddings extracted from phoneme sequence, speaker identity, and reference expressive Mel spectrogram. The proposed system utilizes attention modules for discovering local and global expressivity attributes. Additionally, location-sensitive attention is applied in the decoder to learn the alignment between phoneme sequence-Mel spectrogram pair. In addition to conventional objective metrics for speech synthesis, we used cosine similarity and character error rate (CER) measures for the evaluation of transfer of expressivity and intelligibility. The obtained results demonstrate the presented cosine similarity metric for speaker and expressivity is consistent with the subjective evaluation. Thus, the usage of multiple evaluation measures provides a way to estimate the strength of emotions and the speaker's voice for transferred expressivity in the target speaker's voice. The obtained results show that presented fine-grained TTS systems performed better than the Tacotron 2 based baseline systems.
Fichier principal
Vignette du fichier
EUSIPCO2022_Expressivity_transfert.pdf (232.57 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03615773 , version 1 (21-03-2022)
hal-03615773 , version 2 (28-10-2022)

Identifiants

  • HAL Id : hal-03615773 , version 2

Citer

Ajinkya Kulkarni, Vincent Colotte, Denis Jouvet. Multi-stage attention for fine-grained expressivity transfer in multispeaker text-to-speech system. EUSIPCO 2022, Aug 2022, Belgrade, Serbia. ⟨hal-03615773v2⟩
104 Consultations
154 Téléchargements

Partager

Gmail Facebook X LinkedIn More