HUMB: Automatic Key Term Extraction from Scientiﬁc Articles in GROBID

Patrice Lopez; Laurent Romary

Communication Dans Un Congrès Année : 2010

HUMB: Automatic Key Term Extraction from Scientiﬁc Articles in GROBID

(1, 2) , (1, 2)

1
2

Patrice Lopez

Fonction : Auteur
PersonId : 2984
IdHAL : patricelopez
ORCID : 0000-0002-9959-9441
IdRef : 157929930

Institut für Deutsche Sprache und Linguistik

Inria Saclay - Ile de France

Laurent Romary

Fonction : Auteur
PersonId : 307
IdHAL : laurentromary
ORCID : 0000-0002-0756-0508
IdRef : 060702494

Institut für Deutsche Sprache und Linguistik

Inria Saclay - Ile de France

Résumé

The Semeval task 5 was an opportunity for experimenting with the key term ex- traction module of GROBID, a system for extracting and generating bibliographical information from technical and scientiﬁc documents. The tool ﬁrst uses GROBID's facilities for analyzing the structure of sci- entiﬁc articles, resulting in a ﬁrst set of structural features. A second set of fea- tures captures content properties based on phraseness, informativeness and keyword- ness measures. Two knowledge bases, GRISP and Wikipedia, are then exploited for producing a last set of lexical/semantic features. Bagged decision trees appeared to be the most efﬁcient machine learning algorithm for generating a list of ranked key term candidates. Finally a post rank- ing was realized based on statistics of co- usage of keywords in HAL, a large Open Access publication repository.

Domaines

Informatique et langage [cs.CL]

Fichier principal

article.pdf (70.02 Ko)

TermExtraction.pdf (1.31 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Format : Autre

Laurent Romary : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00493437

Soumis le : vendredi 18 juin 2010-17:17:19

Dernière modification le : vendredi 22 décembre 2023-16:00:04

Archivage à long terme le : lundi 22 octobre 2012-12:00:39

Dates et versions

inria-00493437 , version 1 (18-06-2010)

Identifiants

HAL Id : inria-00493437 , version 1

Citer

Patrice Lopez, Laurent Romary. HUMB: Automatic Key Term Extraction from Scientiﬁc Articles in GROBID. SemEval 2010 Workshop, ACL SigLex event, Jul 2010, Uppsala, Sweden. 4 p. ⟨inria-00493437⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRIA INRIA2

773 Consultations

1940 Téléchargements

HUMB: Automatic Key Term Extraction from Scientiﬁc Articles in GROBID

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager