A new penalized criterion for variable selection and clustering using genotypic data - Université Toulouse III - Paul Sabatier - Toulouse INP Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2010

A new penalized criterion for variable selection and clustering using genotypic data

Résumé

We consider the problem of estimating the number of components and the relevant variables in a mixture model for multilocus genotypic data. A new penalized maximum likelihood criterion is proposed, and a non-asymptotic oracle inequality is obtained. Further, under weak assumptions on the true probability underlying the observations, the selected model is asymptotically consistent. On a practical aspect, the shape of our proposed penalty function is defined up to a multiplicative constant which is calibrated thanks to the slope heuristics, in an automatic data-driven procedure. Using simulated data, we found that this procedure improves the performances of the selection procedure with respect to classical criteria such as BIC and AIC. The new criterion gives an answer to the question ``Which criterion for which sample size?''.
Fichier principal
Vignette du fichier
PenaltyCalibration.pdf (358.65 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-00453236 , version 1 (04-02-2010)
hal-00453236 , version 2 (19-10-2010)
hal-00453236 , version 3 (08-03-2014)

Identifiants

Citer

Dominique Bontemps, Wilson Toussile. A new penalized criterion for variable selection and clustering using genotypic data. 2010. ⟨hal-00453236v1⟩
292 Consultations
1070 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More