Model-based clustering of Gaussian copulas for mixed data
Résumé
A mixture model of Gaussian copulas is introduced to cluster mixed-type data (data set composed by different natures of variables). Thus, the analyze can be performed on data sets composed by any kinds of variables admitting a cumulative distribution function. Copulas are used to modelize the intra-class dependencies and to preserve any distributions for the one-dimensional margins of each component. Typically in this work, each component follows a Gaussian copula which provides one correlation coefficient per couple of variables and per class. Moreover, the one-dimensional margins of each component follow classical parametric distributions in order to facilitate the model interpretation. This model generalizes many well-known models and allows meaningful data visualization as a straightforward by-product issue. A Metropolis-within-Gibbs sampler performs the Bayesian inference by avoiding the difficulties related to the parameter estimation of the copulas with discrete margins. Experiments on simulated and real data illustrate the model advantages: flexible parameters (one-dimensional margins and correlation matrices) associated to visualization aspects.
Origine : Fichiers produits par l'(les) auteur(s)