Multivariate multinomial mixtures: a data-driven penalized criterion for variable selection and clustering

Dominique Bontemps; Wilson Toussile

Pré-Publication, Document De Travail Année : 2010

Multivariate multinomial mixtures: a data-driven penalized criterion for variable selection and clustering

(1) , (1)

Dominique Bontemps

Fonction : Auteur
PersonId : 743708
IdHAL : dominique-bontemps
ORCID : 0009-0007-5460-7050

Laboratoire de Mathématiques d'Orsay

Wilson Toussile

Fonction : Auteur
PersonId : 856305

Laboratoire de Mathématiques d'Orsay

Résumé

We consider the problem of estimating the number of components and the relevant variables in a multivariate multinomial mixture. This kind of models arise in particular when dealing with multilocus genotypic data. A new penalized maximum likelihood criterion is proposed, and a non-asymptotic oracle inequality is obtained. Further, under weak assumptions on the true probability underlying the observations, the selected model is asymptotically consistent. On a practical aspect, the shape of our proposed penalty function is defined up to a multiplicative parameter which is calibrated thanks to the slope heuristics, in an automatic data-driven procedure. Using simulated data, we found that this procedure improves the performances of the selection procedure with respect to classical criteria such as BIC and AIC. The new criterion gives an answer to the question "Which criterion for which sample size?".

Mots clés

Biostatistics Latent class model Multilocus genotypic data Multivariate multinomial mixture Penalized Likelihood Population genetics Slope heuristics Variables selection

Domaines

Statistiques [math.ST] Théorie [stat.TH]

Fichier principal

PenaltyCalibration.pdf (376.07 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Dominique Bontemps : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00453236

Soumis le : mardi 19 octobre 2010-15:54:07

Dernière modification le : vendredi 24 mars 2023-14:52:53

Archivage à long terme le : jeudi 20 janvier 2011-02:48:51

Dates et versions

hal-00453236 , version 1 (04-02-2010)

hal-00453236 , version 2 (19-10-2010)

hal-00453236 , version 3 (08-03-2014)

Identifiants

HAL Id : hal-00453236 , version 2
ARXIV : 1002.1142

Citer

Dominique Bontemps, Wilson Toussile. Multivariate multinomial mixtures: a data-driven penalized criterion for variable selection and clustering. 2010. ⟨hal-00453236v2⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

292 Consultations

1070 Téléchargements

Multivariate multinomial mixtures: a data-driven penalized criterion for variable selection and clustering

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Altmetric

Partager