Learnable pooling with Context Gating for video classification

Antoine Miech; Ivan Laptev; Josef Sivic

Pré-Publication, Document De Travail Année : 2017

Learnable pooling with Context Gating for video classification

(1, 2) , (2, 1) , (2, 3, 1)

1
2
3

Antoine Miech

Fonction : Auteur

Models of visual object recognition and scene understanding

Université Paris sciences et lettres

Ivan Laptev

Fonction : Auteur

Université Paris sciences et lettres

Models of visual object recognition and scene understanding

Josef Sivic

Fonction : Auteur

Université Paris sciences et lettres

Czech Technical University in Prague

Models of visual object recognition and scene understanding

Résumé

Common video representations often deploy an average or maximum pooling of pre-extracted frame features over time. Such an approach provides a simple means to encode feature distributions, but is likely to be suboptimal. As an alternative, we here explore combinations of learnable pooling techniques such as Soft Bag-of-words, Fisher Vectors , NetVLAD, GRU and LSTM to aggregate video features over time. We also introduce a learnable non-linear network unit, named Context Gating, aiming at modeling in-terdependencies between features. We evaluate the method on the multi-modal Youtube-8M Large-Scale Video Understanding dataset using pre-extracted visual and audio features. We demonstrate improvements provided by the Context Gating as well as by the combination of learnable pooling methods. We finally show how this leads to the best performance, out of more than 600 teams, in the Kaggle Youtube-8M Large-Scale Video Understanding challenge.

Domaines

Informatique [cs] Vision par ordinateur et reconnaissance de formes [cs.CV] Intelligence artificielle [cs.AI] Apprentissage [cs.LG]

Fichier principal

miech17youtube.pdf (6.15 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Antoine Miech : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01547378

Soumis le : lundi 26 juin 2017-16:10:55

Dernière modification le : lundi 11 décembre 2023-11:31:29

Archivage à long terme le : mercredi 17 janvier 2018-17:20:26

Dates et versions

hal-01547378 , version 1 (26-06-2017)

Identifiants

HAL Id : hal-01547378 , version 1
ARXIV : 1706.06905

Citer

Antoine Miech, Ivan Laptev, Josef Sivic. Learnable pooling with Context Gating for video classification. 2017. ⟨hal-01547378⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS INRIA INRIA2 PSL

453 Consultations

484 Téléchargements

Learnable pooling with Context Gating for video classification

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager