Segmentation of Continuous Document Flow by a modified Backward- Forward algorithm - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2009

Segmentation of Continuous Document Flow by a modified Backward- Forward algorithm

Thomas Meilender
  • Fonction : Auteur
  • PersonId : 760682
  • IdRef : 174106394
Abdel Belaïd
  • Fonction : Auteur

Résumé

This paper describes a segmentation method of continuous document flow. A document flow is a list of successive scanned pages, put in a production chain, representing several documents without explicit separation mark between them. To separate the documents for their recognition, it is needed to analyze the content of the successive pages and to point out the limit pages of each document. The method proposed here is similar to the variable horizon models (VHM) or multi-grams used in speech recognition. It consists in maximizing the flow likelihood knowing all the Markov Models of the constituent elements. As the calculation of this likelihood on all the flow is NP-complete, the solution consists in studying them in windows of reduced observations. The first results obtained on homogeneous flows of invoices reaches more than 75% of precision and 90% of recall.
Fichier principal
Vignette du fichier
meilender-spie.pdf (393.16 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

inria-00347217 , version 1 (15-12-2008)

Identifiants

  • HAL Id : inria-00347217 , version 1

Citer

Thomas Meilender, Abdel Belaïd. Segmentation of Continuous Document Flow by a modified Backward- Forward algorithm. SPIE - Electronic Imaging, 2009, Los Angeles, United States. ⟨inria-00347217⟩
124 Consultations
276 Téléchargements

Partager

Gmail Facebook X LinkedIn More