Towards a segmentation and recognition-free approach for content-based document image retrieval of handwritten queries

Author(s)
Chatbri, Houssem
Kameyama, Keisuke
Kwan, Paul H
Publication Date
2015
Abstract
We introduce a method for content-based document image retrieval (CBDIR) of handwritten queries that is both segmentation and recognition-free. We first demonstrate that our method is underpinned by a theoretical model that exploits the Bayes' rule. Next, we present an algorithmic implementation that takes into account real world retrieval challenges caused by handwriting fluctuations and style variations. Our algorithm operates as follows: First, a number of connected components of the query are matched against the connected components of the document image using shape features. A similarity threshold is used to select the connected components of the document image that are most similar to the query components. Then, the selected components are used to detect candidate occurrences of the query in the document image by using size-adaptive bounding boxes. Finally, a score is calculated for each candidate occurrence and used for ranking. We conduct a comparative evaluation of our method on a dataset of 200 printed document images, by executing 40 printed and 200 handwritten queries of mathematical expressions. Experimental results demonstrate competitive performances expressed by P-Recall = 100%, A-Recall = 99.95% for printed queries, and P-Recall = 73.5%, A-Recall = 57.92% for handwritten queries, outperforming a state-of-the-art CBDIR algorithm.
Citation
Proceedings of the Third IAPR Asian Conference on Pattern Recognition (ACPR 2015), p. 146-150
ISBN
9781479961009
Link
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Title
Towards a segmentation and recognition-free approach for content-based document image retrieval of handwritten queries
Type of document
Conference Publication
Entity Type
Publication

Files:

NameSizeformatDescriptionLink