Model-based clustering of Gaussian copulas for mixed data

Matthieu Marbac; Christophe Biernacki; Vincent Vandewalle

Pré-Publication, Document De Travail Année : 2014

Model-based clustering of Gaussian copulas for mixed data

(1) , (1, 2) , (1)

1
2

Matthieu Marbac

Fonction : Auteur
PersonId : 936866

MOdel for Data Analysis and Learning

Christophe Biernacki

Fonction : Auteur

MOdel for Data Analysis and Learning

Laboratoire Paul Painlevé - UMR 8524

Vincent Vandewalle

Fonction : Auteur
PersonId : 6383
IdHAL : vincent-vandewalle
ORCID : 0000-0003-2946-9059
IdRef : 14348091X

MOdel for Data Analysis and Learning

Résumé

A mixture model of Gaussian copulas is introduced to cluster mixed-type data (data set composed by different natures of variables). Thus, the analyze can be performed on data sets composed by any kinds of variables admitting a cumulative distribution function. Copulas are used to modelize the intra-class dependencies and to preserve any distributions for the one-dimensional margins of each component. Typically in this work, each component follows a Gaussian copula which provides one correlation coefficient per couple of variables and per class. Moreover, the one-dimensional margins of each component follow classical parametric distributions in order to facilitate the model interpretation. This model generalizes many well-known models and allows meaningful data visualization as a straightforward by-product issue. A Metropolis-within-Gibbs sampler performs the Bayesian inference by avoiding the difficulties related to the parameter estimation of the copulas with discrete margins. Experiments on simulated and real data illustrate the model advantages: flexible parameters (one-dimensional margins and correlation matrices) associated to visualization aspects.

Mots clés

Clustering Gaussian copula Metropolis-within-Gibbs algorithm Mixed data Mixture models Visualization

Domaines

Méthodologie [stat.ME]

Fichier principal

cluster_hetero_gaussian_copula.pdf (591.91 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Matthieu Marbac : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00987760

Soumis le : mercredi 13 août 2014-18:06:47

Dernière modification le : vendredi 19 avril 2024-14:04:05

Archivage à long terme le : jeudi 27 novembre 2014-00:48:01

Dates et versions

hal-00987760 , version 1 (06-05-2014)

hal-00987760 , version 2 (13-08-2014)

hal-00987760 , version 3 (29-09-2015)

hal-00987760 , version 4 (20-12-2016)

Identifiants

HAL Id : hal-00987760 , version 2

Citer

Matthieu Marbac, Christophe Biernacki, Vincent Vandewalle. Model-based clustering of Gaussian copulas for mixed data. 2014. ⟨hal-00987760v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

698 Consultations

1260 Téléchargements

Model-based clustering of Gaussian copulas for mixed data

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager