What does the Canary Say? Low-Dimensional GAN Applied to Birdsong

Silvia Pagliarini; Nathan Trouvain; Arthur Leblois; Xavier Hinaut

Pré-Publication, Document De Travail Année : 2021

What does the Canary Say? Low-Dimensional GAN Applied to Birdsong

(1) , (1) , (2) , (1)

1
2

Silvia Pagliarini

Fonction : Auteur
PersonId : 176970
IdHAL : silvia-pagliarini
ORCID : 0000-0002-3260-1316

Mnemonic Synergy

Nathan Trouvain

Fonction : Auteur

Mnemonic Synergy

Arthur Leblois

Fonction : Auteur
PersonId : 178193
IdHAL : arthur-leblois
ORCID : 0000-0002-9392-5939
IdRef : 111131677

Institut des Maladies Neurodégénératives [Bordeaux]

Xavier Hinaut

Fonction : Auteur
PersonId : 8171
IdHAL : xavier-hinaut
ORCID : 0000-0002-1924-1184
IdRef : 22823218X

Mnemonic Synergy

Résumé

The generation of speech, and more generally complex animal vocalizations, by artificial systems is a difficult problem which has recently been addressed using various techniques in artificial intelligence. Generative Adversarial Networks (GANs) have shown very good abilities for generating images, and more recently sounds. The usability of a GAN generating a vocal repertoire relies in part on our understanding of the representations of the various sounds in the GAN latent space. Here, we aim to test the ability of WaveGAN to produce a set of canary syllables and constrain the latent space to a small dimension. We trained WaveGANs with varying latent space dimensions (from 1 to 6) on a large dataset of canary syllables (16000 renditions of 16 different syllable types). The sounds produced by the generators are identified and evaluated by a RNN-based classifier trained on the same dataset. This quantitative evaluation is paired with a qualitative evaluation of the GAN output spectrograms across GAN training epochs and latent dimensions, comparing multiple instances of the training for each condition. Altogether, our results show that a latent space of dimension 3 is enough to produce a varied repertoire of sounds of quality often indistinguishable from real canary ones, spanning all the types of syllables of the dataset. Importantly, we show that the 3-dimensional GAN generalizes by interpolating between the various syllable types. We rely on UMAP representations to qualitatively show the similarities between the training data and the generated data, and between the generated syllables and the interpolations produced. Exploring the latent representations of syllable types, we show that they form well identifiable subspaces of the latent space. This study provides tools to train simple sensorimotor models, as inverse models, from perceived sounds to motor representations of the same sounds. Both the RNN-based classifier and the small dimensional GAN provide a way to learn the mappings of perceived and produced sounds.

Mots clés

Generative adversarial network Latent space Sound generation Birdsong

Domaines

Réseau de neurones [cs.NE] Intelligence artificielle [cs.AI] Apprentissage [cs.LG] Neurosciences [q-bio.NC]

Fichier principal

Pagliarini2021_canary_GAN__HAL-v1.pdf (21.78 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Xavier Hinaut : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-03244723

Soumis le : mardi 1 juin 2021-15:20:54

Dernière modification le : vendredi 24 mars 2023-14:53:21

Archivage à long terme le : jeudi 2 septembre 2021-19:03:11

Dates et versions

hal-03244723 , version 1 (01-06-2021)

hal-03244723 , version 2 (26-11-2021)

Identifiants

HAL Id : hal-03244723 , version 1

Citer

Silvia Pagliarini, Nathan Trouvain, Arthur Leblois, Xavier Hinaut. What does the Canary Say? Low-Dimensional GAN Applied to Birdsong. 2021. ⟨hal-03244723v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

326 Consultations

119 Téléchargements

What does the Canary Say? Low-Dimensional GAN Applied to Birdsong

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager