Wake up, standOff! - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Document Associé À Des Manifestations Scientifiques Année : 2016

Wake up, standOff!

Résumé

The paper provides an overview of and an update on the on-going proposal to create a component within the TEI architecture. It elicits the conceptual background of having stand-off annotations embedded within a TEI document and the consequences in terms of primary source preservation, multiple annotation views and possible exporting of annotation content into autonomous TEI documents. It demonstrates the various types of possible use cases ranging from manual annotation to fully automatized information extraction processes and show the importance of implementing, right from the onset, the possibility to use any kind of internal or external vocabulary for representing annotation bodies (e.g. to deal with structural or conceptual annotations). An important prospect here is that the construct could lead to a simplified development of TEI-aware online services such as Named Entity Recognisers. We relate to on-going initiatives and show the necessity to align with the Web Annotation Data Model (W3C) as well as with the recent introduction of the element for speech transcription (as part of the work carried out in the ISO standard 24624) as an elementary annotation crystal in the sense of Romary and Wegstein (2012). In this context we tackle the issue of implicitness in the representation of annotations and open the debate related to the trade-off between having a terse vs. highly flexible model. We end up by illustrating the application that is already made of the current proposal in various projects related to data mining or scientific information, and in particular to the representation of annotated scholarly content. Further materials •Minutes of the January 2014 meeting: http://download2.polytechnic.edu.na/pub7/sourceforge/l/li/lingsig/Documents/Standoff%20in%20Berlin,%2001.2014/standoff-minutesBerlin2014.pdf •The TEI GitHub ticket: https://github.com/TEIC/TEI/issues/374 •The standOff proposal on GitHub: https://github.com/laurentromary/stdfSpec (branch AnnArbor) References Bański Piotr (2010). Why TEI standoff annotation doesn’t quite work: and why you might want to use it nevertheless. In Proceedings of Balisage: The Markup Conference, 2010. Vol. 5 of Balisage Series on Markup Technologies ISO/DIS 24624 Language resource management -- Transcription of spoken language Pose Javier, Patrice Lopez and Laurent Romary (2014). A Generic Formalism for Encoding Stand-off annotations in TEI. 2014. Romary Laurent (2015). TEI challenges in an accelerating digital world. DiXiT Convention week, Sep 2015, The Hague, Netherlands. 2015, . Romary Laurent and Werner Wegstein (2012), « Consistent Modeling of Heterogeneous Lexical Structures », Journal of the Text Encoding Initiative [Online], Issue 3 | November 2012, Online since 15 October 2012, connection on 12 May 2016. URL : http://jtei.revues.org/540 ; DOI : 10.4000/jtei.540 (section about Crystals : https://jtei.revues.org/540#tocfrom2n1) Web Annotation Data Model, W3C, https://www.w3.org/TR/annotation-model/
Fichier principal
Vignette du fichier
WakeUpStandOff.pdf (746.09 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01374102 , version 1 (29-09-2016)

Licence

Paternité

Identifiants

  • HAL Id : hal-01374102 , version 1

Citer

Piotr Banski, Bertrand Gaiffe, Patrice Lopez, Simon Meoni, Laurent Romary, et al.. Wake up, standOff!. TEI Conference 2016, Sep 2016, Vienna, Austria. . ⟨hal-01374102⟩
748 Consultations
367 Téléchargements

Partager

Gmail Facebook X LinkedIn More