An Ontology-Based Method for Duplicate Detection in Web Data Tables

Patrice Buche; Juliette Dibie-Barthelemy; Rania Khefifi; Fatiha Saïs

doi:10.1007/978-3-642-23088-2_38

Communication Dans Un Congrès Année : 2011

An Ontology-Based Method for Duplicate Detection in Web Data Tables

(1) , (2) , (3, 4) , (4, 5)

1
2
3
4
5

Patrice Buche

Fonction : Auteur
PersonId : 9239
IdHAL : patrice-buche
ORCID : 0000-0002-9134-5404
IdRef : 031424910

Graphs for Inferences on Knowledge

Juliette Dibie-Barthelemy

Fonction : Auteur
PersonId : 7514
IdHAL : juliette-dibie
ORCID : 0000-0003-0395-1306
IdRef : 165112034

Méthodologies d'Analyse de Risque Alimentaire

Rania Khefifi

Fonction : Auteur

Ingénierie des Agro-polymères et Technologies Émergentes

Laboratoire de Recherche en Informatique

Fatiha Saïs

Fonction : Auteur
PersonId : 2805
IdHAL : fatihasais
ORCID : 0000-0002-6995-2785
IdRef : 124298036

Laboratoire de Recherche en Informatique

Distributed and heterogeneous data and knowledge

Résumé

We present, in this paper, a duplicate detection method in semantically annotated Web data tables, driven by a domain Termino- Ontological Resource (TOR). Our method relies on the fuzzy semantic annotations automatically associated with the Web data tables. A fuzzy semantic annotation is automatically associated with each row of a Web data table. It corresponds to the instantiation of a composed concept of the domain TOR, which represents the semantic n-ary relationship that exists between the columns of the Web data table. A fuzzy semantic annotation contains fuzzy values expressed as fuzzy sets. We propose an automatic duplicate detection method which consists in detecting the pairs of duplicate fuzzy semantic annotations and relies on (i) knowledge declared in the domain TOR and on (ii) similarity measures between fuzzy sets. Two new similarity measures are defined to compare both, the symbolic fuzzy values and the numerical fuzzy values. Our method has been tested on a real application in the domain of chemical risk in food.

Domaines

Web

Fichier principal

DEXA2011_final_1.pdf (1.51 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Patrice Buche : Connectez-vous pour contacter le contributeur

https://hal-lirmm.ccsd.cnrs.fr/lirmm-00611944

Soumis le : mercredi 3 juin 2020-09:47:50

Dernière modification le : samedi 23 mars 2024-18:14:07

Archivage à long terme le : jeudi 3 décembre 2020-02:59:12

Dates et versions

lirmm-00611944 , version 1 (03-06-2020)

Identifiants

HAL Id : lirmm-00611944 , version 1
DOI : 10.1007/978-3-642-23088-2_38
PRODINRA : 163005

Citer

Patrice Buche, Juliette Dibie-Barthelemy, Rania Khefifi, Fatiha Saïs. An Ontology-Based Method for Duplicate Detection in Web Data Tables. DEXA 2011 - 22nd International Conference on Database and Expert Systems Applications, Aug 2011, Toulouse, France. pp.511-525, ⟨10.1007/978-3-642-23088-2_38⟩. ⟨lirmm-00611944⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CIRAD AGROPARISTECH EC-PARIS CNRS INRIA INRA IATE PARISTECH UMR8623 GRAPHIK LIRMM MIA-PARIS INRIA2 UNIV-PARIS-SACLAY MIPS BA UNIV-MONTPELLIER INSTITUT-AGRO-MONTPELLIER INRAE INRAEOCCITANIEMONTPELLIER MATHNUM

331 Consultations

105 Téléchargements

An Ontology-Based Method for Duplicate Detection in Web Data Tables

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager