Late Fusion of Bayesian and Convolutional Models for Action Recognition
Résumé
The activities we do in our daily-life are generally carried out as a succession of atomic actions, following a logical order. During a video sequence, actions usually follow a logical order. In this paper, we propose a hybrid approach resulting from the fusion of a deep learning neural network with a Bayesianbased approach. The latter models human-object interactions and transition between actions. The key idea is to combine both approaches in the final prediction. We validate our strategy in two public datasets: CAD-120 and Watch-n-Patch. We show that our fusion approach yields performance gains in accuracy of respectively +4 percentage points (pp) and +6 pp over a baseline approach. Temporal action recognition performances are clearly improved by the fusion, especially when classes are imbalanced. the decision level, of a C3D [3] convolutional network and our 80 probabilistic ANBM [9] approach based on explicit human-81 object observations.These two approaches take into account 82 the spatio-temporal characteristics of the different classes of 83 actions. Due to the large number of parameters, the C3D 84 network needs a lot of annotated data to be relevant since 85 learning is difficult in the case of under-represented classes. 86 The ANBM approach depends on handcrafted models and 87 even with a little data the prediction of under-represented 88 classes is possible. 89 Thus, our contributions are: (1) one first minor contribution 90 is the addition of a Gated Recurrent Unit (GRU) recurrent 91 layer to the C3D architecture for action recognition which 92 also models the temporal correlations between actions, (2) 93 the comparison of both approaches (ANBM and C3D-GRU) 94 on two public datasets CAD-120 and Watch-n-Patch, (3) 95 implementation and evaluation of a late fusion mechanism of 96 the predictions of these two approaches and comparison with 97 the literature. We observe a performance gain from this hybrid 98 approach. 99 The article is organized as follows. In section 2 we present 100 the state of the art and the context of our work. Then in 101 section 3 we present our hybrid approach for action detection. 102 A comparative study of our results is presented in section 4. 103 Finally, section 5 presents our conclusion and future prospects.
Origine : Fichiers produits par l'(les) auteur(s)