Learnable pooling with Context Gating for video classification - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2017

Learnable pooling with Context Gating for video classification

Résumé

Common video representations often deploy an average or maximum pooling of pre-extracted frame features over time. Such an approach provides a simple means to encode feature distributions, but is likely to be suboptimal. As an alternative, we here explore combinations of learnable pooling techniques such as Soft Bag-of-words, Fisher Vectors , NetVLAD, GRU and LSTM to aggregate video features over time. We also introduce a learnable non-linear network unit, named Context Gating, aiming at modeling in-terdependencies between features. We evaluate the method on the multi-modal Youtube-8M Large-Scale Video Understanding dataset using pre-extracted visual and audio features. We demonstrate improvements provided by the Context Gating as well as by the combination of learnable pooling methods. We finally show how this leads to the best performance, out of more than 600 teams, in the Kaggle Youtube-8M Large-Scale Video Understanding challenge.
Fichier principal
Vignette du fichier
miech17youtube.pdf (6.15 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01547378 , version 1 (26-06-2017)

Identifiants

Citer

Antoine Miech, Ivan Laptev, Josef Sivic. Learnable pooling with Context Gating for video classification. 2017. ⟨hal-01547378⟩
453 Consultations
484 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More