Data-driven Synset Induction and Disambiguation for Wordnet Development - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Article Dans Une Revue Language Resources and Evaluation Année : 2014

Data-driven Synset Induction and Disambiguation for Wordnet Development

Résumé

Automatic methods for wordnet development in languages other than English generally exploit information found in Princeton WordNet (PWN) and translations extracted from parallel corpora. A common approach consists in preserving the structure of PWN and transferring its content in new languages using alignments, possibly combined with information extracted from multilingual semantic resources. Even if the role of PWN remains central in this process, these automatic methods offer an alternative to the manual elaboration of new wordnets. However, their limited coverage has a strong impact on that of the resulting resources. Following this line of research, we apply a cross-lingual word sense disambiguation method to wordnet development. Our approach exploits the output of a data-driven sense induction method that generates sense clusters in new languages, similar to wordnet synsets, by identifying word senses and relations in parallel corpora. We apply our cross-lingual word sense disambiguation method to the task of enriching a French wordnet resource, the WOLF, and show how it can be efficiently used for increasing its coverage. Although our experiments involve the English-French language pair, the proposed methodology is general enough to be applied to the development of wordnet resources in other languages for which parallel corpora are available. Finally, we show how the disambiguation output can serve to reduce the granularity of new wordnets and the degree of polysemy present in PWN.
Fichier principal
Vignette du fichier
LRE_Apidianaki_Sagot_camera_ready.pdf (202.93 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01088000 , version 1 (27-11-2014)

Identifiants

Citer

Marianna Apidianaki, Benoît Sagot. Data-driven Synset Induction and Disambiguation for Wordnet Development. Language Resources and Evaluation, 2014, 48 (4), pp.655-677. ⟨10.1007/s10579-014-9291-2⟩. ⟨hal-01088000⟩
217 Consultations
325 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More