Learning from Video and Text via Large-Scale Discriminative Clustering
Résumé
Discriminative clustering has been successfully applied to a number of weakly-supervised learning tasks. Such applications include person and action recognition, text-to-video alignment, object co-segmentation and co-localization in videos and images. One drawback of dis-criminative clustering, however, is its limited scalability. We address this issue and propose an online optimization algorithm based on the Block-Coordinate Frank-Wolfe algorithm. We apply it to the problem of weakly-supervised learning of actions and actors from movies and corresponding movie scripts. The scaling up of the learning problem to 66 feature-length movies enables us to significantly improve weakly-supervised action recognition.
Origine : Fichiers produits par l'(les) auteur(s)
Loading...