COREclust: a new package for a robust and scalable analysis of complex data

Abstract : In this paper, we present a new R package COREclust dedicated to the detection of representative variables in high dimensional spaces with a potentially limited number of observations. Variable sets detection is based on an original graph clustering strategy denoted CORE-clustering algorithm that detects CORE-clusters, i.e. variable sets having a user defined size range and in which each variable is very similar to at least another variable. Representative variables are then robustely estimate as the CORE-cluster centers. This strategy is entirely coded in C++ and wrapped by R using the Rcpp package. A particular effort has been dedicated to keep its algorithmic cost reasonable so that it can be used on large datasets. After motivating our work, we will explain the CORE-clustering algorithm as well as a greedy extension of this algorithm. We will then present how to use it and results obtained on synthetic and real data.
Complete list of metadatas

Cited literature [27 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01799117
Contributor : Laurent Risser <>
Submitted on : Thursday, May 24, 2018 - 1:36:45 PM
Last modification on : Friday, October 25, 2019 - 1:58:08 AM
Long-term archiving on : Saturday, August 25, 2018 - 1:52:28 PM

Files

ChampionEtAl2018_HAL.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01799117, version 1
  • ARXIV : 1805.10211

Citation

Camille Champion, Anne-Claire Brunet, Jean-Michel Loubes, Laurent Risser. COREclust: a new package for a robust and scalable analysis of complex data. 2018. ⟨hal-01799117⟩

Share

Metrics

Record views

121

Files downloads

119