Hybrid OCR combination for ancient documents - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2005

Hybrid OCR combination for ancient documents

Hubert Cecotti
  • Fonction : Auteur
  • PersonId : 830534
Abdel Belaïd
  • Fonction : Auteur

Résumé

Commercial Optical Character Recognition (OCR) have at lot improved in the last few years. Their outstanding ability to process different kinds of documents is their main quality. However, their generality can also be an issue, as they cannot recognize perfectly documents far from the average present-day documents. We propose in this paper a system combining several OCRs and a specialized ICR (Intelligent Character Recognition) based on a convolutional neural network to complement them. Instead of just performing several OCRs in parallel and applying a fusing rule on the results, a specialized neural network with an adaptive topology is added to complement the OCRs, in function of the OCRs errors. This system has been tested on ancient documents containing old characters and old fonts not used in contemporary documents. The OCRs combination increases the recognition of about 3\% whereas the ICR improves the recognition of rejected characters of more than 5\%.

Dates et versions

inria-00000366 , version 1 (27-09-2005)

Identifiants

Citer

Hubert Cecotti, Abdel Belaïd. Hybrid OCR combination for ancient documents. Third International Conference on Advances in Pattern Recognition - ICAPR 2005, Aug 2005, Bath/UK, pp.646-653, ⟨10.1007/11551188⟩. ⟨inria-00000366⟩
74 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More