HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2010

HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID

Résumé

The Semeval task 5 was an opportunity for experimenting with the key term ex- traction module of GROBID, a system for extracting and generating bibliographical information from technical and scientific documents. The tool first uses GROBID's facilities for analyzing the structure of sci- entific articles, resulting in a first set of structural features. A second set of fea- tures captures content properties based on phraseness, informativeness and keyword- ness measures. Two knowledge bases, GRISP and Wikipedia, are then exploited for producing a last set of lexical/semantic features. Bagged decision trees appeared to be the most efficient machine learning algorithm for generating a list of ranked key term candidates. Finally a post rank- ing was realized based on statistics of co- usage of keywords in HAL, a large Open Access publication repository.
Fichier principal
Vignette du fichier
article.pdf (70.02 Ko) Télécharger le fichier
TermExtraction.pdf (1.31 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Format : Autre

Dates et versions

inria-00493437 , version 1 (18-06-2010)

Identifiants

  • HAL Id : inria-00493437 , version 1

Citer

Patrice Lopez, Laurent Romary. HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID. SemEval 2010 Workshop, ACL SigLex event, Jul 2010, Uppsala, Sweden. 4 p. ⟨inria-00493437⟩

Collections

INRIA INRIA2
773 Consultations
1940 Téléchargements

Partager

Gmail Facebook X LinkedIn More