Combining clustering of variables and feature selection using random forests - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Article Dans Une Revue Communications in Statistics - Simulation and Computation Année : 2021

Combining clustering of variables and feature selection using random forests

Résumé

Standard approaches to tackle high-dimensional supervised classification often include variable selection and dimension reduction. The proposed methodology combines clustering of variables and feature selection. Hierarchical clustering of variables allows to built groups of correlated variables and summarizes each group by a synthetic variable. Originality is that groups of variables are unknown a priori. Moreover clustering approach deals with both numerical and categorical variables. Among all the possible partitions, the most relevant synthetic variables are selected with a procedure using random forests. Numerical performances are illustrated on simulated and real datasets. Selection of groups of variables provides easier interpretation of results.

Dates et versions

hal-02013631 , version 1 (11-02-2019)

Identifiants

Citer

Marie Chavent, Robin Genuer, Jerome Saracco. Combining clustering of variables and feature selection using random forests. Communications in Statistics - Simulation and Computation, 2021, 50 (2), pp.426-445. ⟨10.1080/03610918.2018.1563145⟩. ⟨hal-02013631⟩
116 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More