What does the Canary Say? Low-Dimensional GAN Applied to Birdsong - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2021

What does the Canary Say? Low-Dimensional GAN Applied to Birdsong

Silvia Pagliarini
Nathan Trouvain
  • Fonction : Auteur
Xavier Hinaut

Résumé

The generation of speech, and more generally complex animal vocalizations, by artificial systems is a difficult problem which has recently been addressed using various techniques in artificial intelligence. Generative Adversarial Networks (GANs) have shown very good abilities for generating images, and more recently sounds. The usability of a GAN generating a vocal repertoire relies in part on our understanding of the representations of the various sounds in the GAN latent space. Here, we aim to test the ability of WaveGAN to produce a set of canary syllables and constrain the latent space to a small dimension. We trained WaveGANs with varying latent space dimensions (from 1 to 6) on a large dataset of canary syllables (16000 renditions of 16 different syllable types). The sounds produced by the generators are identified and evaluated by a RNN-based classifier trained on the same dataset. This quantitative evaluation is paired with a qualitative evaluation of the GAN output spectrograms across GAN training epochs and latent dimensions, comparing multiple instances of the training for each condition. Altogether, our results show that a latent space of dimension 3 is enough to produce a varied repertoire of sounds of quality often indistinguishable from real canary ones, spanning all the types of syllables of the dataset. Importantly, we show that the 3-dimensional GAN generalizes by interpolating between the various syllable types. We rely on UMAP representations to qualitatively show the similarities between the training data and the generated data, and between the generated syllables and the interpolations produced. Exploring the latent representations of syllable types, we show that they form well identifiable subspaces of the latent space. This study provides tools to train simple sensorimotor models, as inverse models, from perceived sounds to motor representations of the same sounds. Both the RNN-based classifier and the small dimensional GAN provide a way to learn the mappings of perceived and produced sounds.
Fichier principal
Vignette du fichier
Pagliarini2021_canary_GAN__HAL-v1.pdf (21.78 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03244723 , version 1 (01-06-2021)
hal-03244723 , version 2 (26-11-2021)

Identifiants

  • HAL Id : hal-03244723 , version 1

Citer

Silvia Pagliarini, Nathan Trouvain, Arthur Leblois, Xavier Hinaut. What does the Canary Say? Low-Dimensional GAN Applied to Birdsong. 2021. ⟨hal-03244723v1⟩
326 Consultations
119 Téléchargements

Partager

Gmail Facebook X LinkedIn More