CFP last date
22 April 2024
Reseach Article

Preprocessing Challenges in Document Image Analysis

Published on May 2012 by Keshao D. Kalaskar, Mahendra P. Dhore
National Conference on Recent Trends in Computing
Foundation of Computer Science USA
NCRTC - Number 9
May 2012
Authors: Keshao D. Kalaskar, Mahendra P. Dhore
9ad8a3f3-bebc-4aed-8459-c576f7652975

Keshao D. Kalaskar, Mahendra P. Dhore . Preprocessing Challenges in Document Image Analysis. National Conference on Recent Trends in Computing. NCRTC, 9 (May 2012), 25-30.

@article{
author = { Keshao D. Kalaskar, Mahendra P. Dhore },
title = { Preprocessing Challenges in Document Image Analysis },
journal = { National Conference on Recent Trends in Computing },
issue_date = { May 2012 },
volume = { NCRTC },
number = { 9 },
month = { May },
year = { 2012 },
issn = 0975-8887,
pages = { 25-30 },
numpages = 6,
url = { /proceedings/ncrtc/number9/6586-1076/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 National Conference on Recent Trends in Computing
%A Keshao D. Kalaskar
%A Mahendra P. Dhore
%T Preprocessing Challenges in Document Image Analysis
%J National Conference on Recent Trends in Computing
%@ 0975-8887
%V NCRTC
%N 9
%P 25-30
%D 2012
%I International Journal of Computer Applications
Abstract

Document Image Analysis (DIA) is the subfield of digital image processing that aims at converting document images to symbolic form for modification, storages, retrieval, reuse and transmission. It helps the transition from bookshelves and filing cabinets to the paperless and perhaps even wireless world. Preprocessing is the first stage in document image analysis. In Document Image Analysis, Preprocessing activity involves Representation, Noise reduction, Binarization, Skew estimation/detection, Zoning, Character segmentation. This paper focuses on the major challenges that are to be faced in preprocessing of document images for document image analysis.

References
  1. E. T. Endo, "On a Methods of Bianry-Picture representation and its application to data compression," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 2, no. 1 pp 27-35, January 1980.
  2. S. Yajima, J. L. Goodsell, T. Ichida, and H. Hirasishi, "Data Compression of Kanji Character Patterns Digitized on a Hexagonal Mesh", IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 3, no. 2 pp 121-229, February 1981.
  3. H. Nagahashi and M. Nakatsuyama, "A Pattern Description and Generation Method of Structural Characters", IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 8, no. 1 pp 112-117, January 1986.
  4. C. A. Cabrelli and U. M. Molter, "Automatic Representation of Binary image", IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, no. 12 pp 1190-1195, December 1990.
  5. T Taxt, PJ. Plynn, and A. K. Jain , "Segmentation of Document Images," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 12 pp 1322-1329, December 1989.
  6. O. D. Trier and T. Taxt, "Evaluation of Binarization Melhods for Document Images," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 3 pp 312-314, March 1995.
  7. O. D. Trier and A. K. Jain, "Goal-Directed Evaluation of Binarization Methods," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 12 pp 1191-1201, December 1995.
  8. O. D. Trier, T. Taxt, and G. K. Jain, "Recognition of Digits in Hydrographic Maps: Binary Versus Topographic Analysis," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 4 pp 399-404, April 1997.
  9. Y. Liu and S. Srihari, "Documcnt Image Binarization Based on Texture Features," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 5 pp 540-544, May 1997.
  10. P. Sarkar, G. Nagy, J. Zhou, and D. Lopresti, "Spatial Sampling of Printed Patterns," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 3 pp 344-350, March 1998.
  11. D. I. Havelock, "Geometric Precision in Noise-Free Digital Images," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 10 pp 1065-1075, Oct 1989.
  12. D. I. Havelock, " the Topology of locales and Its Effect on position Uncertainty," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 4 pp 380-385, April 1991.
  13. H. K. Aghajnn and T. Kailatli, "SLIDE: Subspace-Based Line Detection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 11 pp 1057-1073, Nov 1994.
  14. B. B. Chaudhuri and U. Pal, "Skew Angle Detection of Digitized Script Documents" IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 2 pp 182-186, Feb 1997.
  15. A. K. Jain and B. Yu "Document Representation and Its Application to Image Decomposition," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 3 pp 294-308, March 1998.
  16. R. G. Casey and E. Lccolinet, "A Survey of Methods and Strategies in Character Segmentation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 7 pp 690-706, July 1996.
  17. J. Rocha and T. Pavlidis, "Character Recognition without Segmentation," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 17, no. 9 pp 903-909, Sept 1995.
  18. Hoover et al. , "An Experimental Comparison of Range Image Segmentation Algorithms" IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 7 pp 673-689, July 1996.
  19. R. J. Ulichney and D. T. Troxel, "Scaling Binary Images with a Telescoping Template" IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 4, no. 3 pp 331-335, March 1982.
  20. Namane and M. A. Sid-Ahmad, " Character scaling by contour method," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 12, no. 6 pp 600-606, June 1990.
  21. Zramdini and R. Ingold, "Optical Font Identification Using Typographic Features," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 8 pp 877-882, August 1998.
  22. A. L. Spitz, "Determination of the Script and Language Content of Document Images," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 3 pp 235-245, March 1997.
  23. T. N. Tan, "Rotation Invariant Texture Features and Their use in Automatic Script Identification," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 7 pp 751-756, July 1998.
  24. M. Cheriet and C. Y. SUEN, "Extraction of Key Letters Script Recognition," Pattern Recognition Letters, vol 14, pp. 1009-1017, 1993
Index Terms

Computer Science
Information Sciences

Keywords

Document Image Analysis Information Retrieval Binarization Skew Detection Character Segmentation