What does the Canary Say? Low-Dimensional GAN Applied to Birdsong - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2021

What does the Canary Say? Low-Dimensional GAN Applied to Birdsong

Silvia Pagliarini
Nathan Trouvain
  • Fonction : Auteur
Xavier Hinaut

Résumé

The generation of speech, and more generally com- plex animal vocalizations, by artificial systems is a difficult problem. Generative Adversarial Networks (GANs) have shown very good abilities for generating images, and more recently sounds. While current GANs have high-dimensional latent spaces, complex vocalizations could in principle be generated through a low-dimensional latent space, easing the visualization and evaluation of latent representations. In this study, we aim to test the ability of a previously developed GAN, called WaveGAN, to reproduce canary syllables while drastically reducing the latent space dimension. We trained WaveGAN on a large dataset of canary syllables (16000 renditions of 16 different syllable types) and varied the latent space dimensions from 1 to 6. The sounds produced by the generator are evaluated using a RNN- based classifier. This quantitative evaluation is paired with a qualitative evaluation of the GAN productions across training epochs and latent dimensions. Altogether, our results show that a 3-dimensional latent space is enough to produce all syllable types in the repertoire with a quality often indistinguishable from real canary vocalizations. Importantly, we show that the 3-dimensional GAN generalizes by interpolating between the various syllable types. We rely on UMAP [1] to qualitatively show the similarities between training and generated data, and between the generated syllables and the interpolations produced. We discuss how our study may provide tools to train simple models of vocal production and/or learning. Indeed, while the RNN- based classifier provides a biologically realistic representation of the auditory network processing vocalizations, the small dimensional GAN may be used for the production of complex vocal repertoires.
Fichier principal
Vignette du fichier
Pagliarini2021_canary_GAN__HAL-v2.pdf (22.56 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03244723 , version 1 (01-06-2021)
hal-03244723 , version 2 (26-11-2021)

Identifiants

  • HAL Id : hal-03244723 , version 2

Citer

Silvia Pagliarini, Nathan Trouvain, Arthur Leblois, Xavier Hinaut. What does the Canary Say? Low-Dimensional GAN Applied to Birdsong. 2021. ⟨hal-03244723v2⟩

Relations

325 Consultations
119 Téléchargements

Partager

Gmail Facebook X LinkedIn More