Tube-CNN: Modeling temporal evolution of appearance for object detection in video - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2019

Tube-CNN: Modeling temporal evolution of appearance for object detection in video

Résumé

Object detection in video is crucial for many applications. Compared to images, video provides additional cues which can help to disambiguate the detection problem. Our goal in this paper is to learn discriminative models for the temporal evolution of object appearance and to use such models for object detection. To model temporal evolution, we introduce space-time tubes corresponding to temporal sequences of bounding boxes. We propose two CNN architectures for generating and classifying tubes, respectively. Our tube proposal network (TPN) first generates a large number of spatio-temporal tube proposals maximizing object recall. The Tube-CNN then implements a tube-level object detector in the video. Our method improves state of the art on two large-scale datasets for object detection in video: HollywoodHeads and ImageNet VID. Tube models show particular advantages in difficult dynamic scenes.

Dates et versions

hal-01980339 , version 1 (14-01-2019)

Identifiants

Citer

Tuan-Hung Vu, Anton Osokin, Ivan Laptev. Tube-CNN: Modeling temporal evolution of appearance for object detection in video. 2019. ⟨hal-01980339⟩
165 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More