Random forests and big data

Robin Genuer; Jean-Michel Poggi; Christine Tuleau-Malot; Nathalie Villa-Vialaneix

Communication Dans Un Congrès Année : 2015

Random forests and big data

(1, 2) , (3) , (4) , (5)

1
2
3
4
5

Robin Genuer

Fonction : Auteur
PersonId : 1787
IdHAL : robin-genuer
IdRef : 15657490X

Statistics In System biology and Translational Medicine

Institut de Santé Publique, d'Epidémiologie et de Développement

Jean-Michel Poggi

Fonction : Auteur
PersonId : 966903

Laboratoire de Mathématiques d'Orsay

Christine Tuleau-Malot

Fonction : Auteur
PersonId : 8956
IdHAL : christine-malot
IdRef : 194152928

Laboratoire Jean Alexandre Dieudonné

Nathalie Villa-Vialaneix

Fonction : Auteur
PersonId : 4221
IdHAL : nathalie-vialaneix
ORCID : 0000-0003-1156-0639
IdRef : 101680503

Unité de Mathématiques et Informatique Appliquées de Toulouse

Résumé

Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involves massive data but it also often includes data streams and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based on decision trees combined with aggregation and bootstrap ideas, random forests, introduced by Breiman in 2001, are a powerful nonparametric statistical method allowing to consider in a single and versatile framework regression problems as well as two-class or multi-class classification problems. This paper reviews available proposals about random forests in parallel environments as well as about online random forests. Then, we formulate various remarks and sketch some alternative directions for random forests in the Big Data context.

Mots clés

Random forests Data streams Big Data

Domaines

Statistiques [math.ST]

Fichier principal

genuer_etal_jds2015.pdf (83.09 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Nathalie Vialaneix : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01160643

Soumis le : lundi 8 juin 2015-10:11:30

Dernière modification le : jeudi 14 mars 2024-03:10:53

Archivage à long terme le : mardi 15 septembre 2015-11:57:11

Dates et versions

hal-01160643 , version 1 (08-06-2015)

Identifiants

HAL Id : hal-01160643 , version 1
PRODINRA : 306579

Citer

Robin Genuer, Jean-Michel Poggi, Christine Tuleau-Malot, Nathalie Villa-Vialaneix. Random forests and big data. 47ème Journées de Statistique de la SFdS, Société Française de Statistique, Jun 2015, Lille, France. ⟨hal-01160643⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSERM CNRS INRIA INRA DIEUDONNE LM-ORSAY INRIA2 UNIV-PARIS-SACLAY UNIV-COTEDAZUR INRAE U1219 GS-MATHEMATIQUES INRAEOCCITANIETOULOUSE MATHNUM MIAT

305 Consultations

1219 Téléchargements

Random forests and big data

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager