SelvarClustMV: Variable selection approach in model-based clustering allowing for missing values - Université Toulouse III - Paul Sabatier - Toulouse INP Accéder directement au contenu
Article Dans Une Revue Journal de la Société Française de Statistique Année : 2012

SelvarClustMV: Variable selection approach in model-based clustering allowing for missing values

Résumé

Overabundance of clustering methods exists but none was devised with a variable selection procedure and a missing data management. However in microarray datasets, genes are described by a growing number of experiments and missing data always exist. It is also important to detect the relevant experiments for improving the gene clustering and the data interpretation. A common practice is to remove genes with missing values or to replace missing values with estimation. However it is known to have an important impact on the clustering result. We tackle variable selection and missing data in a unique statistical framework: A versatile variable selection model based on multidimensional Gaussian mixtures is proposed, taking variable roles for clustering into account. Moreover this statistical framework manages missing values without imposing any data pre-processing. Numerical experiments highlight the gain of our method compared to imputation methods which do not allow to find the true variable roles and sometimes lose biological information.
Fichier non déposé

Dates et versions

hal-00972003 , version 1 (03-04-2014)

Identifiants

  • HAL Id : hal-00972003 , version 1
  • PRODINRA : 326956

Citer

Cathy Maugis-Rabusseau, Martin-Magniette Marie-Laure, Pelletier Sandra. SelvarClustMV: Variable selection approach in model-based clustering allowing for missing values. Journal de la Société Française de Statistique, 2012, 15 (2), pp.21-36. ⟨hal-00972003⟩
171 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More