Parameter-less co-clustering for star-structured heterogeneous data

Dino Ienco; Céline Robardet; Ruggero G. Pensa; Rosa Meo

doi:10.1007/s10618-012-0248-z

Article Dans Une Revue Data Mining and Knowledge Discovery Année : 2013

Parameter-less co-clustering for star-structured heterogeneous data

(1, 2) , (3) , (4) , (4)

1
2
3
4

Dino Ienco

Fonction : Auteur
PersonId : 6226
IdHAL : dino-ienco
ORCID : 0000-0002-8736-3132
IdRef : 172688183

ADVanced Analytics for data SciencE

Territoires, Environnement, Télédétection et Information Spatiale

Céline Robardet

Fonction : Auteur
PersonId : 3355
IdHAL : celine-robardet
ORCID : 0000-0002-8583-9408
IdRef : 070207054

COMputational BIology and data miNING

Ruggero G. Pensa

Fonction : Auteur

Department of Computer Engineering

Rosa Meo

Fonction : Auteur

Department of Computer Engineering

Résumé

The availability of data represented with multiple features coming from heterogeneous domains is getting more and more common in real world applications. Such data represent objects of a certain type, connected to other types of data, the features, so that the overall data schema forms a star structure of inter-relationships. Co-clustering these data involves the specification of many parameters, such as the number of clusters for the object dimension and for all the features domains. In this paper we present a novel co-clustering algorithm for heterogeneous star-structured data that is parameter-less. This means that it does not require either the number of row clusters or the number of column clusters for the given feature spaces. Our approach optimizes the Goodman–Kruskal’s τ, a measure for cross-association in contingency tables that evaluates the strength of the relationship between two categorical variables. We extend τ to evaluate co-clustering solutions and in particular we apply it in a higher dimensional setting. We propose the algorithm CoStar which optimizes τ by a local search approach. We assess the performance of CoStar on publicly available datasets from the textual and image domains using objective external criteria. The results show that our approach outperforms state-of-the-art methods for the co-clustering of heterogeneous data, while it remains computationally efficient.

Mots clés

CO-CLUSTERING STAR STRUCTURED DATA MULTI-VIEW

ANALYSE DE DONNEES HETEROGENEITE ANALYSE INFORMATIQUE ALGORITHME

Domaines

Informatique [cs]

Fichier principal

mt2012-pub00036393.pdf (1.19 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Import Ws Irstea : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00794744

Soumis le : mardi 26 février 2013-14:40:35

Dernière modification le : mardi 12 mars 2024-10:45:00

Archivage à long terme le : dimanche 2 avril 2017-05:20:13

Dates et versions

hal-00794744 , version 1 (26-02-2013)

Identifiants

HAL Id : hal-00794744 , version 1
DOI : 10.1007/s10618-012-0248-z
IRSTEA : PUB00036393

Citer

Dino Ienco, Céline Robardet, Ruggero G. Pensa, Rosa Meo. Parameter-less co-clustering for star-structured heterogeneous data. Data Mining and Knowledge Discovery, 2013, 26 (2), pp.217-254. ⟨10.1007/s10618-012-0248-z⟩. ⟨hal-00794744⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CIRAD AGROPARISTECH CNRS INRIA UNIV-LYON1 UNIV-LYON2 INSA-LYON EC-LYON IRSTEA LIRIS ADVANSE LIRMM AGROPOLIS INRIA2 TETIS MIPS LABEXIMU UNIV-MONTPELLIER INSA-GROUPE UDL INRAE INRAEOCCITANIEMONTPELLIER MATHNUM

751 Consultations

595 Téléchargements

Parameter-less co-clustering for star-structured heterogeneous data

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager