Big Data Methods for Computational Linguistics - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Article Dans Une Revue Bulletin of the Technical Committee on Data Engineering Année : 2012

Big Data Methods for Computational Linguistics

Résumé

Many tasks in computational linguistics traditionally rely on hand-crafted or curated resources like the- sauri or word-sense-annotated corpora. The availability of big data, from the Web and other sources, has changed this situation. Harnessing these assets requires scalable methods for data and text ana- lytics. This paper gives an overview on our recent work that utilizes big data methods for enhancing semantics-centric tasks dealing with natural language texts. We demonstrate a virtuous cycle in harvest- ing knowledge from large data and text collections and leveraging this knowledge in order to improve the annotation and interpretation of language in Web pages and social media. Specifically, we show how to build large dictionaries of names and paraphrases for entities and relations, and how these help to disambiguate entity mentions in texts.
Fichier non déposé

Dates et versions

hal-01122699 , version 1 (04-03-2015)

Identifiants

  • HAL Id : hal-01122699 , version 1

Citer

Gerhard Weikum, Johannes Hoffart, Ndapa Nakashole, Marc Spaniol, Fabian M. Suchanek, et al.. Big Data Methods for Computational Linguistics. Bulletin of the Technical Committee on Data Engineering, 2012, pp.10. ⟨hal-01122699⟩
188 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More