VSURF : un package R pour la sélection de variables à l'aide de forêts aléatoires - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2014

VSURF : un package R pour la sélection de variables à l'aide de forêts aléatoires

Résumé

Variable selection is a crucial issue in many applied classication and regression problems. It is of interest for statistical analysis as well as for modelization or prediction purposes to remove irrelevant variables, to select all important ones or to determine a sucient subset for prediction. These main different objectives on a statistical learning perspective involve variable selection to simplify statistical problems, to help diagnosis and interpretation, and to speed up data processing. The authors have proposed a variable selection method based on random forests, and the aim of this presentation is to describe the (recently available on CRAN) associated R package called VSURF and to illustrate its use on real datasets. Introduced by Breiman, random forests (abbreviated RF in the sequel) is an attractive non-parametric statistical method to deal with such problems, since it requires only mild conditions on the model supposed to have generated the observed data. Indeed, since it is based on decision trees and it uses aggregation ideas, RF allow to consider in an elegant and versatile framework dierent models and problems, namely regressions, two-class or multiclass classications. In Genuer et.al. 2010 we have distinguished two variable selection objectives: interpretation and prediction. The first is to find important variables highly related to the response variable in order to select all the important variables, even with high redundancy. The second is to find a small number of variables sucient to a good parsimonious prediction of the response variable. We have proposed the following two-step procedure, the first one is the same for the two situations while the second one depends on the objective.
Fichier principal
Vignette du fichier
Genuer_VSURF_RR2014.pdf (102.74 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01096237 , version 1 (17-12-2014)

Identifiants

  • HAL Id : hal-01096237 , version 1

Citer

R Genuer, J.-M Poggi, C Tuleau-Malot. VSURF : un package R pour la sélection de variables à l'aide de forêts aléatoires. 3èmes Rencontres R, 2014, Montpellier, France. ⟨hal-01096237⟩
266 Consultations
135 Téléchargements

Partager

Gmail Facebook X LinkedIn More