CFP last date
22 April 2024
Reseach Article

A Language Independent Characterization of Document Image Noise in Historical Scripts

by Sandhya.n, R. Krishnan, D. R. Ramesh Babu
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 50 - Number 9
Year of Publication: 2012
Authors: Sandhya.n, R. Krishnan, D. R. Ramesh Babu
10.5120/7798-0915

Sandhya.n, R. Krishnan, D. R. Ramesh Babu . A Language Independent Characterization of Document Image Noise in Historical Scripts. International Journal of Computer Applications. 50, 9 ( July 2012), 11-18. DOI=10.5120/7798-0915

@article{ 10.5120/7798-0915,
author = { Sandhya.n, R. Krishnan, D. R. Ramesh Babu },
title = { A Language Independent Characterization of Document Image Noise in Historical Scripts },
journal = { International Journal of Computer Applications },
issue_date = { July 2012 },
volume = { 50 },
number = { 9 },
month = { July },
year = { 2012 },
issn = { 0975-8887 },
pages = { 11-18 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume50/number9/7798-0915/ },
doi = { 10.5120/7798-0915 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:49:08.147601+05:30
%A Sandhya.n
%A R. Krishnan
%A D. R. Ramesh Babu
%T A Language Independent Characterization of Document Image Noise in Historical Scripts
%J International Journal of Computer Applications
%@ 0975-8887
%V 50
%N 9
%P 11-18
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Digitization of historical documents helps preserve these documents. As these documents have existed for a long time, various types of noise creep in. In our paper we have analyzed the different types of noise that occur in printed and handwritten historical documents mainly based on Kannada (Kannada is a language used in Karnataka, a southern state in India) documents and created a taxonomy for the same. We have also characterized each noise type based on factors such as their source, their effect on characters and the associated challenges in character recognition. We have also catalogued the different noise detection, removal and restoration techniques that are reported in the literature for each of the prominent noise types, and identified areas relating to noise detection, removal for further research focus.

References
  1. B. V. Dhandra, V. S. Malemath, Mallikarjun. H, Ravindra Hegadi, Skew Detection in Binary Image Documents Based on Image Dilation and Region labeling Approach, The 18th International Conference on Pattern Recognition (ICPR'06), 2006.
  2. Manjunath Aradhya. V. N, Hemantha Kumar. G, Shivakumara. P, Skew detection technique for binary document images based on Hough transform, International Journal of Information Technology, Vol. 3,2006.
  3. B. M. Sagar, G. Shobha, P. Ramakanth Kumar, Character Segmentation Algorithms For Kannada Optical Character Recognition, Proceedings of the 2008 International Conference on Wavelet Analysis and Pattern Recognition, Hong Kong, 30-31 Aug. 2008
  4. D. R. Ramesh Babu, Piyush. M. Kumat, Mahesh. D. Dhannawat, Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach, IPCV 2006, 510-515.
  5. R. D. Lins, A Taxonomy for Noise Detection in Images of Paper Documents - The Physical Noises. International Conference on Image Analysis and Recognition, LNCS 5627, pp 844-854. Springer Verlag, 2009.
  6. Bansal. V. , and Sinha, R. M. K. Segmentation of touching and fused Devanagari characters. Pattern Recognition-2002. 35, 4, 875-893.
  7. Reza Farrahi Moghaddam, David Rivest-Henault, and Mohamed Cheriet, Restoration and segmentation of highly degraded characters using a shape-independent level set approach and multi-level classifiers, 10th International Conference on Document Analysis and Recognition, 2009.
  8. Avanindra, Subhasis Chaudhuri, Robust Detection of Skew in Document Images, IEEE transactions on image processing, vol. 6, no. 2, february 1997.
  9. S. Banerjee, S. Noushath, P. Parikh, S. Ramachandrula, A. Kuchibhotla, A. Sharma, Real-time embedded skew detection and frame removal, 17th IEEE International Conference on Image Processing (ICIP), 2010.
  10. Zheng Zhang, Chew Lim Tan, Recovery of Distorted Document Images from Bound Volumes, Proceedings of Sixth International Conference on Document Analysis and Recognition, 2001.
  11. Worapoj Peerawit and Asanee Kawtrakul, Marginal Noise Removal from Document Images Using Edge Density, ICEP2004, Phuket, Thailand, 2004.
  12. Syed Saqib Bukhari, Faisal Shafaity, Thomas M. Breuel, Border Noise Removal of Camera-Captured Document Images using Page Frame Detection, 4th International Workshop on Camera-Based Document Analysis and Recognition, Lecture Notes in Computer Science, Beijing, China, Springer, 9/2011.
  13. H. Fan, L. Zhu, and Y. Tang, Skew detection in document images based on rectangular active contour, International Journal on Document Analysis and Recognition, vol. 13, no. 4, pp. 261–269, 2010.
  14. Xiaoyi Jiang, Bunke, H, Widmer-Kljajo, D, Skew detection of document images by focused nearest-neighbor clustering, Proceedings of the Fifth International Conference on Document Analysis and Recognition,1999.
  15. Negishi. H, Kato. J, Hase. H, Watanabe. T, Character Extraction from Noisy Background for an Automatic Reference System, Proceedings of the Fifth International Conference on Document Analysis and Recognition, 1999.
  16. Abhijit Mitra, Restoration of Noisy Document Images with an Efficient Bi-Level Adaptive Thresholding, World Academy of Science, Engineering and Technology 18 2006.
  17. Wafa Boussellaa, Aymen Bougacha, Abderrazak Zahour, Haikal EL Abed, Adel Alimi, Enhanced Text Extraction from Arabic Degraded Document Images using EM Algorithm, 10th International Conference on Document Analysis and Recognition, 2009.
  18. Minoru Mori, Minako Sawaki, Norihiro Hagita, Hiroshi Murase, and Naoki Mukawa, Robust feature extraction based on run-length compensation for degraded handwritten character recognition, Proceedings of Sixth International Conference on Document Analysis and Recognition, 2001.
  19. Website of Kaleido software & services, online available: http:// kannadaocr. com/downloads/ KanScanUser Guide_v10b. pdf.
Index Terms

Computer Science
Information Sciences

Keywords

optical character recognition global noise local noise