Optimal spectral transportation with application to music transcription

Rémi Flamary; Cédric Févotte; Nicolas Courty; Valentin Emiya

Communication Dans Un Congrès Année : 2016

Optimal spectral transportation with application to music transcription

(1, 2) , (3, 2, 4) , (5) , (6)

1
2
3
4
5
6

Rémi Flamary

Fonction : Auteur
PersonId : 22
IdHAL : remi-flamary
ORCID : 0000-0002-4212-6627
IdRef : 188395008

Observatoire de la Côte d'Azur

Joseph Louis LAGRANGE

Cédric Févotte

Fonction : Auteur
PersonId : 184864
IdHAL : cedric-fevotte
ORCID : 0000-0003-3801-5534
IdRef : 083298460

Signal et Communications

Joseph Louis LAGRANGE

Centre National de la Recherche Scientifique

Nicolas Courty

Fonction : Auteur
PersonId : 2118
IdHAL : nicolas-courty
ORCID : 0000-0003-1353-0126
IdRef : 103931317

Environment observation with complex imagery

Valentin Emiya

Fonction : Auteur
PersonId : 302
IdHAL : valentin-emiya
ORCID : 0000-0001-7102-6943
IdRef : 158421981

éQuipe AppRentissage et MultimediA [Marseille]

Résumé

Many spectral unmixing methods rely on the non-negative decomposition of spectral data onto a dictionary of spectral templates. In particular, state-of-the-art music transcription systems decompose the spectrogram of the input signal onto a dictionary of representative note spectra. The typical measures of fit used to quantify the adequacy of the decomposition compare the data and template entries frequency-wise. As such, small displacements of energy from a frequency bin to another as well as variations of timbre can disproportionally harm the fit. We address these issues by means of optimal transportation and propose a new measure of fit that treats the frequency distributions of energy holistically as opposed to frequency-wise. Building on the harmonic nature of sound, the new measure is invariant to shifts of energy to harmonically-related frequencies, as well as to small and local displacements of energy. Equipped with this new measure of fit, the dictionary of note templates can be considerably simplified to a set of Dirac vectors located at the target fundamental frequencies (musical pitch values). This in turns gives ground to a very fast and simple decomposition algorithm that achieves state-of-the-art performance on real musical data. 1 Context Many of nowadays spectral unmixing techniques rely on non-negative matrix decompositions. This concerns for example hyperspectral remote sensing (with applications in Earth observation, astronomy, chemistry, etc.) or audio signal processing. The spectral sample v n (the spectrum of light observed at a given pixel n, or the audio spectrum in a given time frame n) is decomposed onto a dictionary W of elementary spectral templates, characteristic of pure materials or sound objects, such that v n ≈ Wh n. The composition of sample n can be inferred from the non-negative expansion coefficients h n. This paradigm has led to state-of-the-art results for various tasks (recognition, classification, denoising, separation) in the aforementioned areas, and in particular in music transcription, the central application of this paper. In state-of-the-art music transcription systems, the spectrogram V (with columns v n) of a musical signal is decomposed onto a dictionary of pure notes (in so-called multi-pitch estimation) or chords. V typically consists of (power-)magnitude values of a regular short-time Fourier transform (Smaragdis and Brown, 2003). It may also consists of an audio-specific spectral transform such as the Mel-frequency transform, like in (Vincent et al., 2010), or the Q-constant based transform, like in (Oudre et al., 2011). The success of the transcription system depends of course on the adequacy of the time-frequency transform & the dictionary to represent the data V.

Domaines

Machine Learning [stat.ML] Traitement du signal et de l'image [eess.SP]

Fichier principal

nips_2016.pdf (835.97 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Nicolas Courty : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01377533

Soumis le : vendredi 7 octobre 2016-10:17:31

Dernière modification le : lundi 26 février 2024-11:22:09

Archivage à long terme le : vendredi 3 février 2017-18:58:08

Dates et versions

hal-01377533 , version 1 (07-10-2016)

Identifiants

HAL Id : hal-01377533 , version 1
ARXIV : 1609.09799

Citer

Rémi Flamary, Cédric Févotte, Nicolas Courty, Valentin Emiya. Optimal spectral transportation with application to music transcription. Advances in Neural Information Processing Systems (NIPS), Dec 2016, Barcelona, Spain. ⟨hal-01377533⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSU INSTITUT-TELECOM UNIV-TLSE2 UNIV-RENNES1 LIF CNRS INRIA UNIV-AMU INSA-RENNES IRISA EC-MARSEILLE OCA DIEUDONNE LAGRANGE SMS UBS IRISA_UBS UT1-CAPITOLE CENTRALESUPELEC IRISA-D5 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-COTEDAZUR LIS-LAB UNIV-RENNES IRIT IRIT-SC ANR UR1-MATH-NUM IRIT-SI IRIT-CNRS TOULOUSE-INP UNIV-UT3 UT3-TOULOUSEINP

1149 Consultations

316 Téléchargements

Optimal spectral transportation with application to music transcription

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager