Big Data Methods for Computational Linguistics

Gerhard Weikum; Johannes Hoffart; Ndapa Nakashole; Marc Spaniol; Fabian M. Suchanek; Mohamed Amir Yosef

Article Dans Une Revue Bulletin of the Technical Committee on Data Engineering Année : 2012

Big Data Methods for Computational Linguistics

(1) , (1) , (1) , (2) , (3) , (1)

1
2
3

Gerhard Weikum

Fonction : Auteur

Max-Planck-Institut für Informatik

Johannes Hoffart

Fonction : Auteur

Max-Planck-Institut für Informatik

Ndapa Nakashole

Fonction : Auteur

Max-Planck-Institut für Informatik

Marc Spaniol

Fonction : Auteur
PersonId : 753180
IdHAL : marc-spaniol
ORCID : 0000-0002-5094-4523

Equipe Hultech - Laboratoire GREYC - UMR6072

Fabian M. Suchanek

Fonction : Auteur
PersonId : 12540
IdHAL : fabian-suchanek
ORCID : 0000-0001-7189-2796
IdRef : 203477707

Distributed and heterogeneous data and knowledge

Mohamed Amir Yosef

Fonction : Auteur

Max-Planck-Institut für Informatik

Résumé

Many tasks in computational linguistics traditionally rely on hand-crafted or curated resources like the- sauri or word-sense-annotated corpora. The availability of big data, from the Web and other sources, has changed this situation. Harnessing these assets requires scalable methods for data and text ana- lytics. This paper gives an overview on our recent work that utilizes big data methods for enhancing semantics-centric tasks dealing with natural language texts. We demonstrate a virtuous cycle in harvest- ing knowledge from large data and text collections and leveraging this knowledge in order to improve the annotation and interpretation of language in Web pages and social media. Specifically, we show how to build large dictionaries of names and paraphrases for entities and relations, and how these help to disambiguate entity mentions in texts.

Domaines

Informatique [cs]

Marc Spaniol : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01122699

Soumis le : mercredi 4 mars 2015-13:55:47

Dernière modification le : mercredi 20 mars 2024-16:20:04

Dates et versions

hal-01122699 , version 1 (04-03-2015)

Identifiants

HAL Id : hal-01122699 , version 1

Citer

Gerhard Weikum, Johannes Hoffart, Ndapa Nakashole, Marc Spaniol, Fabian M. Suchanek, et al.. Big Data Methods for Computational Linguistics. Bulletin of the Technical Committee on Data Engineering, 2012, pp.10. ⟨hal-01122699⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

EC-PARIS CNRS INRIA GREYC GREYC-HULTECH UMR8623 COMUE-NORMANDIE INRIA2 UNIV-PARIS-SACLAY ENSICAEN UNICAEN

188 Consultations

0 Téléchargements

Big Data Methods for Computational Linguistics

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager