Data Driven Lemmatization and Parsing of Italian - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2012

Data Driven Lemmatization and Parsing of Italian

Résumé

This paper aims at presenting some preliminary results for data driven lemmatisation for Italian. Based on a joint lemmatisation and part-of-speech tagging models, our system relies on a architecture that has already been proved successful for French. 'Besides' intrinsic evaluation for this task, we want to measure its usefulness and adequacy by using our system as input for the task of parsing. This approach achieves state-of-the-art parsing accuracy on unlabeled text without any gold information supplied (83.70% of F1 score in a 10-fold cross-validation setting), without requiring any prior knowledge of the language. This shows that our methodology is perfectly suitable for wide coverage parsing of Italian

Dates et versions

hal-00778153 , version 1 (18-01-2013)

Identifiants

Citer

Djamé Seddah, Joseph Le Roux, Benoît Sagot. Data Driven Lemmatization and Parsing of Italian. EVALITA 2011 - Evaluation of NLP and Speech Tools for Italian, Jan 2012, Rome, Italy. pp.249-256, ⟨10.1007/978-3-642-35828-9_27⟩. ⟨hal-00778153⟩
137 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More