The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Résumé
Multiword expressions (MWEs) are known as a "pain in the neck" for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one's heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as " words with spaces ". We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-million-word annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems.
Fichier principal
W17-1704.pdf (278.76 Ko)
Télécharger le fichier
W17-7610.pdf (441.44 Ko)
Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...