Learning from Video and Text via Large-Scale Discriminative Clustering

Discriminative clustering has been successfully applied to a number of weakly-supervised learning tasks. Such applications include person and action recognition, text-to-video alignment, object co-segmentation and co-localization in videos and images. One drawback of dis-criminative clustering, however, is its limited scalability. We address this issue and propose an online optimization algorithm based on the Block-Coordinate Frank-Wolfe algorithm. We apply it to the problem of weakly-supervised learning of actions and actors from movies and corresponding movie scripts. The scaling up of the learning problem to 66 feature-length movies enables us to significantly improve weakly-supervised action recognition.

Domaines

Informatique [cs] Intelligence artificielle [cs.AI] Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

ICCV_arxiv.pdf (3.19 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Antoine Miech : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01569540

Soumis le : jeudi 27 juillet 2017-07:56:51

Dernière modification le : vendredi 19 avril 2024-16:18:56

Dates et versions

hal-01569540 , version 1 (27-07-2017)

hal-01569540 , version 2 (28-07-2017)

Identifiants

HAL Id : hal-01569540 , version 1

Citer

Antoine Miech, Jean-Baptiste Alayrac, Piotr Bojanowski, Ivan Laptev, Josef Sivic (Dir.). Learning from Video and Text via Large-Scale Discriminative Clustering. published by the authors, 2017. ⟨hal-01569540v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

315 Consultations

1458 Téléchargements