Automatic Thesaurus Generation from Raw Text using Knowledge-Poor Techniques

Gregory Grefenstette

Communication Dans Un Congrès Année : 1993

Automatic Thesaurus Generation from Raw Text using Knowledge-Poor Techniques

(1)

Gregory Grefenstette

Fonction : Auteur
PersonId : 2537
IdHAL : gregory-grefenstette
ORCID : 0000-0001-8479-049X
IdRef : 075539381

Inria Saclay - Ile de France

Résumé

In addition to showing how lexical units are related within a eld, domain-speciic thesauri give an idea of what subjects are important to that eld and are thus useful at many points in an information system. The major impediment to creation of thesauri has been the cost of their manual creation. We present here a number of automatic techniques that jointly produce a rst draft of a thesaurus from any domain-deening collection of text. The techniques are knowledge-poor in that no domain knowledge is required for their use. We have successfully applied these techniques to over twenty corpora ranging from 1 to 6 megabytes. Results from the thesaurus produced from a collection of medical abstracts will also be presented here.

Domaines

Informatique et langage [cs.CL]

Fichier principal

Automatic Thesaurus Generation from Raw Text.pdf (164.52 Ko)

acl.bst (23.07 Ko)

aclap.sty (11.44 Ko)

my.bib (145.32 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Gregory Grefenstette : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01154133

Soumis le : mercredi 6 janvier 2016-08:05:51

Dernière modification le : mercredi 15 mars 2023-08:56:16

Archivage à long terme le : jeudi 7 avril 2016-15:44:03

Dates et versions

hal-01154133 , version 1 (06-01-2016)

Identifiants

HAL Id : hal-01154133 , version 1

Citer

Gregory Grefenstette. Automatic Thesaurus Generation from Raw Text using Knowledge-Poor Techniques. MAKING SENSE OF WORDS. NINTH ANNUAL CONFERENCE OF THE UW CENTRE FOR THE NEW OED AND TEXT RESEARCH, Oxford University Press, Sep 1993, Oxford, United Kingdom. ⟨hal-01154133⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRIA INRIA2

396 Consultations

375 Téléchargements

Automatic Thesaurus Generation from Raw Text using Knowledge-Poor Techniques

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager