Data Driven Lemmatization and Parsing of Italian

Djamé Seddah; Joseph Le Roux; Benoît Sagot

doi:10.1007/978-3-642-35828-9_27

Communication Dans Un Congrès Année : 2012

Data Driven Lemmatization and Parsing of Italian

(1) , (2) , (1)

1
2

Djamé Seddah

Fonction : Auteur
PersonId : 11545
IdHAL : djameseddah
IdRef : 086185136

Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing

Joseph Le Roux

Fonction : Auteur
PersonId : 1192450
IdHAL : joseph-le-roux
ORCID : 0000-0002-3889-8536

Laboratoire d'Informatique de Paris-Nord

Benoît Sagot

Fonction : Auteur
PersonId : 1461
IdHAL : bsagot
ORCID : 0000-0002-0107-8526
IdRef : 177454229

Analyse Linguistique Profonde à Grande Echelle ; Large-scale deep linguistic processing

Résumé

This paper aims at presenting some preliminary results for data driven lemmatisation for Italian. Based on a joint lemmatisation and part-of-speech tagging models, our system relies on a architecture that has already been proved successful for French. 'Besides' intrinsic evaluation for this task, we want to measure its usefulness and adequacy by using our system as input for the task of parsing. This approach achieves state-of-the-art parsing accuracy on unlabeled text without any gold information supplied (83.70% of F1 score in a 10-fold cross-validation setting), without requiring any prior knowledge of the language. This shows that our methodology is perfectly suitable for wide coverage parsing of Italian

Mots clés

statistical parsing data driven lemmatisation morphological clustering italian pcfg la

Domaines

Traitement du texte et du document

Djamé Seddah : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00778153

Soumis le : vendredi 18 janvier 2013-18:16:55

Dernière modification le : vendredi 24 mars 2023-14:52:56

Dates et versions

hal-00778153 , version 1 (18-01-2013)

Identifiants

HAL Id : hal-00778153 , version 1
DOI : 10.1007/978-3-642-35828-9_27

Citer

Djamé Seddah, Joseph Le Roux, Benoît Sagot. Data Driven Lemmatization and Parsing of Italian. EVALITA 2011 - Evaluation of NLP and Speech Tools for Italian, Jan 2012, Rome, Italy. pp.249-256, ⟨10.1007/978-3-642-35828-9_27⟩. ⟨hal-00778153⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-PARIS7 UNIV-PARIS13 CNRS INRIA LIPN INRIA2 GALILE SORBONNE-PARIS-NORD ANR

137 Consultations

0 Téléchargements

Data Driven Lemmatization and Parsing of Italian

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager