Learning Multi-Modal Dictionaries

Gianluca Monaci; Philippe Jost; Pierre Vandergheynst; Boris Mailhé; Sylvain Lesage; Rémi Gribonval

doi:10.1109/TIP.2007.901813

Article Dans Une Revue IEEE Transactions on Image Processing Année : 2007

Learning Multi-Modal Dictionaries

(1) , (1) , (1) , (2) , (2) , (2)

1
2

Gianluca Monaci

Fonction : Auteur

LTS2 - EPFL

Philippe Jost

Fonction : Auteur

LTS2 - EPFL

Pierre Vandergheynst

Fonction : Auteur

LTS2 - EPFL

Boris Mailhé

Fonction : Auteur

Speech and sound data modeling and processing

Sylvain Lesage

Fonction : Auteur
PersonId : 13547
IdHAL : sylvain-lesage
ORCID : 0000-0002-8462-0957
IdRef : 180816713

Speech and sound data modeling and processing

Rémi Gribonval

Fonction : Auteur
PersonId : 1255
IdHAL : remi-gribonval
ORCID : 0000-0002-9450-8125
IdRef : 113181590

Speech and sound data modeling and processing

Résumé

Real-world phenomena involve complex interactions between multiple signal modalities. As a consequence, humans are used to integrate at each instant perceptions from all their senses in order to enrich their understanding of the surrounding world. This paradigm can be also extremely useful in many signal processing and computer vision problems involving mutually related signals. The simultaneous processing of multimodal data can, in fact, reveal information that is otherwise hidden when considering the signals independently. However, in natural multimodal signals, the statistical dependencies between modalities are in general not obvious. Learning fundamental multimodal patterns could offer deep insight into the structure of such signals. In this paper, we present a novel model of multimodal signals based on their sparse decomposition over a dictionary of multimodal structures. An algorithm for iteratively learning multimodal generating functions that can be shifted at all positions in the signal is proposed, as well. The learning is defined in such a way that it can be accomplished by iteratively solving a generalized eigenvector problem, which makes the algorithm fast, flexible, and free of user-defined parameters. The proposed algorithm is applied to audiovisual sequences and it is able to discover underlying structures in the data. The detection of such audio-video patterns in audiovisual clips allows to effectively localize the sound source on the video in presence of substantial acoustic and visual distractors, outperforming state-of-the-art audiovisual localization algorithms.

Domaines

Traitement du signal et de l'image [eess.SP] Traitement du signal et de l'image [eess.SP]

Fichier principal

2007_TIP_MonaciEtAl.pdf (1.07 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Rémi Gribonval : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00544772

Soumis le : lundi 7 février 2011-16:35:27

Dernière modification le : lundi 22 avril 2024-12:20:15

Archivage à long terme le : dimanche 8 mai 2011-02:32:50

Dates et versions

inria-00544772 , version 1 (07-02-2011)

Identifiants

HAL Id : inria-00544772 , version 1
DOI : 10.1109/TIP.2007.901813

Citer

Gianluca Monaci, Philippe Jost, Pierre Vandergheynst, Boris Mailhé, Sylvain Lesage, et al.. Learning Multi-Modal Dictionaries. IEEE Transactions on Image Processing, 2007, 16 (9), pp.2272-2283. ⟨10.1109/TIP.2007.901813⟩. ⟨inria-00544772⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA IRISA-D5 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES INSA-GROUPE UR1-MATH-NUM

291 Consultations

535 Téléchargements

Learning Multi-Modal Dictionaries

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager