A binned technique for scalable model-based clustering on huge datasets - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

A binned technique for scalable model-based clustering on huge datasets

Résumé

Clustering is impacted by the regular increase of sample sizes which provides opportunity to reveal information previously out of scope. However, the volume of data leads to some issues related to the need of many computational resources and also to high energy consumption. Resorting to binned data depending on an adaptive grid is expected to give proper answer to such green computing issues while not harming the quality of the related estimation. After a brief review of existing methods, a first application in the context of univariate model-based clustering is provided, with a numerical illustration of its advantages. Finally, an initial formalization of the multivariate extension is done, highlighting both issues and possible strategies.
Fichier principal
Vignette du fichier
Short_papermbc2.pdf (108.78 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03097284 , version 1 (05-01-2021)
hal-03097284 , version 2 (05-01-2022)

Identifiants

  • HAL Id : hal-03097284 , version 1

Citer

Filippo Antonazzo, Christophe Biernacki, Christine Keribin. A binned technique for scalable model-based clustering on huge datasets. MBC2 - Models and Learning for Clustering and Classification, Sep 2020, Catania, Italy. ⟨hal-03097284v1⟩
127 Consultations
84 Téléchargements

Partager

Gmail Facebook X LinkedIn More