COREclust: a new package for a robust and scalable analysis of complex data - Université Toulouse III - Paul Sabatier - Toulouse INP Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2018

COREclust: a new package for a robust and scalable analysis of complex data

Résumé

In this paper, we present a new R package COREclust dedicated to the detection of representative variables in high dimensional spaces with a potentially limited number of observations. Variable sets detection is based on an original graph clustering strategy denoted CORE-clustering algorithm that detects CORE-clusters, i.e. variable sets having a user defined size range and in which each variable is very similar to at least another variable. Representative variables are then robustely estimate as the CORE-cluster centers. This strategy is entirely coded in C++ and wrapped by R using the Rcpp package. A particular effort has been dedicated to keep its algorithmic cost reasonable so that it can be used on large datasets. After motivating our work, we will explain the CORE-clustering algorithm as well as a greedy extension of this algorithm. We will then present how to use it and results obtained on synthetic and real data.
Fichier principal
Vignette du fichier
ChampionEtAl2018_HAL.pdf (1.19 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01799117 , version 1 (24-05-2018)

Identifiants

Citer

Camille Champion, Anne-Claire Brunet, Jean-Michel Loubes, Laurent Risser. COREclust: a new package for a robust and scalable analysis of complex data. 2018. ⟨hal-01799117⟩
125 Consultations
106 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More