Call for Paper - February 2021 Edition
IJCA solicits original research papers for the February 2021 Edition. Last date of manuscript submission is January 20, 2021. Read More

Keyword Spotting in Scanned Images of Historical Handwritten Devanagri Documents

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2019
Authors:
Sushma S. N., Sharada B.
10.5120/ijca2019918322

Sushma S N. and Sharada B.. Keyword Spotting in Scanned Images of Historical Handwritten Devanagri Documents. International Journal of Computer Applications 181(36):5-9, January 2019. BibTeX

@article{10.5120/ijca2019918322,
	author = {Sushma S. N. and Sharada B.},
	title = {Keyword Spotting in Scanned Images of Historical Handwritten Devanagri Documents},
	journal = {International Journal of Computer Applications},
	issue_date = {January 2019},
	volume = {181},
	number = {36},
	month = {Jan},
	year = {2019},
	issn = {0975-8887},
	pages = {5-9},
	numpages = {5},
	url = {http://www.ijcaonline.org/archives/volume181/number36/30264-2019918322},
	doi = {10.5120/ijca2019918322},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

Huge quantity of information is lying quiescent in historical manuscripts. This information would go wasted if it is not stored digitally. In keyword spotting, all occurrences of a query keyword image are retrieved from scanned document images. The problem of spotting words from handwritten documents is difficult due to its huge changeability in writing styles and its large vocabulary. Existing keyword spotting approach is mainly based on statistical depiction of word image. This paper presents an efficient structural depiction of word image, where the handwritten words are represented using graph based method for historical handwritten devanagari manuscripts. Experimentation is conducted on historical handwritten Shankaracharya’s documents written in Devanagari. The results were promising in terms of accuracy and efficiency.

References

  1. T. M. Rath and R. Manmatha, “Word spotting for historical documents,” in International Journal on Document Analysis and Recognition (IJDAR), vol. 9 pp. 139–152, 2007.
  2. R. Plamondon and S. Srihari, “Online and off-line handwriting recognition: A comprehensive survey,” in IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, pp. 63–84, 2000.
  3. T. Konidaris, B. Gatos, K. Ntzios, I. Pratikakis, S. Theodoridis, S.J. Perantonis, ” Keyword guided word spotting in historical printed documents using synthetic data and user feedback,” in International Journal of Document Analysis and Recognition, vol. 9, pp.167–177, 2007.
  4. J. Almazan and A. Gordo and A. Forn ´ es and E. Valveny, “Segmentation free Word Spotting with Exemplar SVMs,” Pattern Recognition, 2014.
  5. J. Almazan, A. Gordo , A. Fornes and E. Valveny, “Word Spotting and Recognition with Embedded Attributes,” TPAMI, 2014.
  6. S. Wshah, G. Kumar, V. Govindaraju, "Script independent word spotting in offline handwritten documents based on hidden markov models", Frontiers in Handwriting Recognition (ICFHR) 2012 International Conference on, pp. 14-19, 2012.
  7. V. Frinken, A. Fischer, R. Manrnatha, H. Bunke, "A Novel Word Spotting Method Based on Recurrent Neural Networks", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 2, pp. 224, Feb. 2012.
  8. I. B. Messaoud, H. Amiri, H. E. Abed, V. Margner, ” Document Preprocessing System – Automatic Selection of Binarization,” in IAPR International Workshop on Document Analysis Systems, 2012.
  9. T. M. Rath and R. Manmatha, “Word image matching using dynamic time warping,” in CVPR, vol. 2, pp.521, 2003
  10. A. Jose, R. Serranoa and F. Perronninb, “Handwritten word-spotting using hidden Markov models and universal vocabularies,” in Pattern Recognition, vol. 42, pp. 2106-2116,2009.
  11. S. Kim , S. Park, C. Jeong , J. Kim , H. Park , G. Lee “ Keyword Spotting on Korean Document Images by Matching the Keyword Image, ” in Digital libraries, vol. 3815, pp. 158–166, 2005
  12. C. L. Liu, J. Kim, J. H. Kim, “Model-based stroke extraction and matching for handwritten Chinese character recognition,” in Pattern Recognition vol. 34, pp. 2339-2352, 2001.
  13. T. Adamek and N. O. Connor, “Efficient contour-based shape representation and matching,” in ACM SIGMM International Workshop on Multimedia Information Retrieval, 2003.
  14. Y. Leydier, A.Ouji, F.L.Bourgeois and H.Emptoz, “Towards an omnilingual word retrieval system for ancient manuscripts,” in Pattern Recognition vol.42, pp. 2089-2105, 2009.
  15. T. Novikova, O. Barinova, , P. Kohli, Lempitsky, “ Large-lexicon attribute consistent text recognition in natural images,” Computer Vision -  ECCV, 2012.
  16. S. Mozaffari, K. Faez, V. Märgner and H. E. Abed ,” Two-stage lexicon reduction for offline Arabic handwritten word recognition,” in International Journal of Pattern Recognition and Artificial Intelligence, vol. 22, pp. 1323, 2008.
  17. A. Andreev and N. Kirov, ”Some variants of Hausdorff distance for word matching,” in Review of the National Center for Digitization, vol.12, pp.3–8, 2008.
  18. S. H. Cha, C. C. Tappert, S. N. Srihari, "Optimizing Binary Feature Vector Similarity Measure using Genetic Algorithm and Handwritten Character Recognition,"  in International Conference on Document Analysis and Recognition, vol. 02, , pp. 662, 2003.
  19. N. I. Cho and S. K. Mitra, “Warped discrete cosine transform and its application in image compression,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, pp. 1364-1373, 2000.
  20. U. V. Marti and H. Bunke, “Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system,” in Journal of Pattern Recognition and Art. Intelligence, vol. 15, pp. 65–90, 2001.
  21. K.A.Senthildevi and E. Chandra, “Keyword spotting system for Tamil isolated words using Multidimensional MFCC and DTW algorithm,” in International Conference on Communications and Signal Processing (ICCSP) , 2015.
  22. S. Abirami and D. Manjula, "Profile Based Information Retrieval from Printed Document Images," in International Conference Computer Graphics, Imaging and Visualization, vol. 3 , pp. 268-272, 2013.
  23. A. Shrivastava, T. Malisiewicz, A. Gupta, and A. A. Efros, “Datadriven visual similarity for cross-domain image matching,”in ACM TOG, vol. 30, pp. 154, 2011.
  24. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” in International journal of computer vision, vol. 60, pp. 91–110, 2004.
  25. V. Frinken, A. Fischer, R. Manmatha, H. Bunke, “A novel word spotting method based on recurrent neural networks,” in IEEE Trans.Pattern Anal. Mach. Intell. Vol. 34 , pp. 211-224, 2012.
  26. V. Lavrenko, T. Rath, R. Manmatha, “Holistic Word Recognition for Handwritten Historical Documents,” in Proc. Doc. Image Anal. Libr. DIAL’04, pp. 278–287., 2004
  27. [M. Rusiñol, D. Aldavert, R. Toledo, and J. Llados, “Browsing Heterogeneous Document Collections by a Segmentation-free Word Spotting Method.,” in ICDAR., 2011.
  28. S. N. Srihari, H. Srinivasan, C. Huang and S. Shetty, "Spotting Words in Latin, Devanagari and Arabic Scripts," in Indian Journal of Artificial Intelligence, vol.16, pp. 2-9, 2006.
  29. A. Papandreou, B. Gatos, G. Louloudis, and N. Stamatopoulos, “Document image skew estimation contest,” in ICDAR, pp. 1444–1448, 2013.
  30. M. Kumar, M. K. Jindal and R. K. Sharma, “k -Nearest Neighbor Based Offline Handwritten Gurmukhi Character Recognition,” in International Conference on Image Information Processing, 2011.
  31. K. Riesen., S. Emmenegger, H.Bunke, “ A novel software toolkit for graph edit distance computation. In: Graph-Based Representations,” in Pattern Recognition, pp. 142-151, 2013.
  32. K. Riesen, H. Bunke ” Approximate graph edit distance computation by means of bipartite graph matching,” in Image and Vision Computing, vol. 27, pp. 950-959, 2009.
  33. J. L. Rothfeder, S. Feng and T. M. Rath, “Using corner feature correspondences to rank word images similarity,” in Computer Vision and Pattern Recognition Workshop, pp. 30-35, 2003.

Keywords

Keyword spotting, segmentation, ranking