Biomedical concept extraction based on combining the content-based and word order similarities

Duy Dinh; Lynda Tamine

doi:10.1145/1982185.1982438

Communication Dans Un Congrès Année : 2011

Biomedical concept extraction based on combining the content-based and word order similarities

(1) , (1, 2)

1
2

Duy Dinh

Fonction : Auteur
PersonId : 888518

Systèmes d’Informations Généralisées

Lynda Tamine

Fonction : Auteur
PersonId : 744669
IdHAL : lynda-tamine-lechani
ORCID : 0000-0002-3615-8032
IdRef : 110204875

Systèmes d’Informations Généralisées

Université Toulouse III - Paul Sabatier

Résumé

It is well known that the main objective of conceptual retrieval models is to go beyond simple term matching by relaxing term independence assumption through concept recognition. In this paper, we present an approach of semantic indexing and retrieval of biomedical documents through the process of identifying domain concepts extracted from the Medical Subject Headings (MeSH) thesaurus. Our indexing approach relies on a purely statistical vector space model, which represents medical documents and MeSH concepts as term vectors. By leveraging a combination of the bag-of-word concept representation and word positions in the textual features, we demonstrate that our mapping method is able to extract valuable concepts from documents. The output of this semantic mapping serves as the input to our relevance document scoring in response to a query. Experiments on the OHSUMED collection show that our semantic indexing method significantly outperforms state-of-art baselines that employ word or term statistics.

Mots clés

Semantic Indexing Concept Recognition Biomedical Information Retrieval Document Expansion

Domaines

Recherche d'information [cs.IR]

Fichier principal

SAC2011_Duy_Dinh_Lynda_Tamine.pdf (117.31 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Duy Dinh : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00588335

Soumis le : vendredi 22 avril 2011-22:38:21

Dernière modification le : lundi 20 novembre 2023-11:44:21

Archivage à long terme le : samedi 23 juillet 2011-02:50:06

Dates et versions

hal-00588335 , version 1 (22-04-2011)

Identifiants

HAL Id : hal-00588335 , version 1
DOI : 10.1145/1982185.1982438

Citer

Duy Dinh, Lynda Tamine. Biomedical concept extraction based on combining the content-based and word order similarities. ACM Symposium on Applied Computing (SAC 2011), Mar 2011, TaiChung, Taiwan. pp.1159--1163, ⟨10.1145/1982185.1982438⟩. ⟨hal-00588335⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS UT1-CAPITOLE IRIT IRIT-SIG IRIT-GD TOULOUSE-INP UNIV-UT3 UT3-TOULOUSEINP

156 Consultations

358 Téléchargements

Biomedical concept extraction based on combining the content-based and word order similarities

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager