A Framework for Multi-level Linguistic Annotation - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2000

A Framework for Multi-level Linguistic Annotation

Résumé

This article presents a 3-step model for multi- layer annotations of corpora. Each kind of an- notation for a textual corporacorresponds to a dierent view on the same document. This prin- ciple can be expressed rst with a general re- lational model dedicated to the organisation of LR. This abstract model is then implemented as an application of the XML formalism for the en- coding of large corpora. The exploitation of this kind of annotated corpora requires ecient ma- nipulation processes and reversive access. We propose to use a third step representation based on a set of optimised FSA resulting from the parsing of the XML documents. These propo- sitions have been implemented in the rst ver- sion of a workbench dedicated to the French Le Monde corpus.
Fichier principal
Vignette du fichier
lopez-romary.pdf (228.93 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00525227 , version 1 (11-10-2010)

Identifiants

  • HAL Id : inria-00525227 , version 1

Citer

Patrice Lopez, Laurent Romary. A Framework for Multi-level Linguistic Annotation. LREC Workshop on Large Corpus Annotation and Software Standards, Data Architectures and Software Support for Large Corpora,, May 2000, Athens, Greece. ⟨inria-00525227⟩
132 Consultations
66 Téléchargements

Partager

Gmail Facebook X LinkedIn More