Form item extraction based on line searching
Résumé
This paper presents an item searching method which has been applied to various kinds of forms. This approach is based on line detection through the Hough transform. After obtaining the straight lines, Hough directions are used to detect the real segments in the image. Segments can correspond either to continuous line, or to black parts of dashed or dotted lines. So, the segments are grouped together and classified between both adjacent line crossing points. Items are located by searching the minimum cycles of the graph constructed from the line intersection points. The last step consists of verifying the line classes based on the homogeneity hypothesis of item sides. This method was applied to French Tax forms and tables coming from scientific publications. The experimental results have demonstrated the robustness and the reliability of such an approach to various forms with different types of item delimiters.