CFP last date
22 April 2024
Reseach Article

Segmentation of Characters from Old Typewritten Documents using Radon Transform

by Apurva A. Desai
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 37 - Number 9
Year of Publication: 2012
Authors: Apurva A. Desai
10.5120/4635-6683

Apurva A. Desai . Segmentation of Characters from Old Typewritten Documents using Radon Transform. International Journal of Computer Applications. 37, 9 ( January 2012), 10-15. DOI=10.5120/4635-6683

@article{ 10.5120/4635-6683,
author = { Apurva A. Desai },
title = { Segmentation of Characters from Old Typewritten Documents using Radon Transform },
journal = { International Journal of Computer Applications },
issue_date = { January 2012 },
volume = { 37 },
number = { 9 },
month = { January },
year = { 2012 },
issn = { 0975-8887 },
pages = { 10-15 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume37/number9/4635-6683/ },
doi = { 10.5120/4635-6683 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:23:52.093568+05:30
%A Apurva A. Desai
%T Segmentation of Characters from Old Typewritten Documents using Radon Transform
%J International Journal of Computer Applications
%@ 0975-8887
%V 37
%N 9
%P 10-15
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Optical character recognition is a very challenging area. Many works have been done and still being done for many languages across the world. For many Indian languages too good amount of work has been done. However, Gujarati is a language for which hardly any work can be found. Gujarati has a rich literary heritage, and therefore it is important to preserve it for the next generation. In this paper an attempt has be done to segmenting out the words and characters from old typewritten Gujarati documents. Here an algorithm is presented which makes use of global threshold for converting scan RGB documents to blank and white documents. Noise removal has also been applied. Here Radon transform is utilized for skew detection. The novel concept of using Radon transform is presented here in this work. Here Radon transform is used for segmenting documents into lines and then vertical profiles has been used for further segmentation of lines in characters. At last this segmentation algorithm is also tested for the documents typewritten in Hindi. The algorithm presented here gives very good results.

References
  1. E. Kavallieratou, E. Stamatatos, Improving the quality of degraded document images, proceedings of the second international conference on document image analysis for libraries (Dial ’06), 2006, 330-349
  2. K. Ntzios, B. Gatos, I. Pratikakis, T. Konidaris, S.J. Perantonis, an old greek handwritten ocr system, proceedings of the 2005 eight international conference on document analysis and recognition, (ICDAR 05), 2005, 64-69
  3. E. Kavallieratou, S. Stathis, Adaptive binarization of historical document images, Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), 2006, 742-745
  4. B. Gatos, I. Pratikakis, S.J. Perantotis, Adaptive Degraded Document Image Binarization, Pattern Recognition, 39, 2006, 317-327
  5. G. Agam, G. Bal, G. Frieder, O. Frieder, Degraded document image enhancement, http://ir.iit.edu/publications/downloads/doc_enhancement.pdf
  6. Manjunath Aradhya V N, Hemlatha Kumar G, Shuvkumara P., skew estimation technique for binary document images based in thinning and moments, Engineering Letters, 14, 1 (Advance Online Publication)
  7. N.P. Banshree, R. Vasanta, OCR for script identification of hindi (devnagari) numerals using feature sub selection by means of end-point with neuro-memetic model, Proceedings of World Academy of Science, Engineering and Technology, 22,(2007,78-82
  8. S.M. Murtoza Habib, N. Ahmed Noor, M. Khan, Skew angle detection of bangla script using radon transform, http://www.bracuniversity.net/research/crbpl/papers/paper_ICCT06_skew_Murtoza.pdf
  9. A.K. Das, B. Chanda, A fast algorithm for skew detection of document images using morphology, International Journal of Document Analysis and Recognition, 4, 2001, 109-114
  10. C. Bhagvati, T. Ravi, SM. Kumar, A. Negi, On developing high accuracy ocr system for telugu and other indian scripts, Proceedings of the Language Engineering Conference (LEC’02), 18-23
  11. B.M. Sagar, Shobha G., R. Kumar, OCR for kannada text to machine editable format using database approach, WSEAS Transactions on Computers, 6(7).2008, 766-769
  12. M.K. Jindal, R.K. Sharma and G.S. Lehal, Segmentation of horizontally overlapping lines in printed indian scripts, International Journal of Computational Intelligence Research, 3 (4), 2007 277 – 286
  13. Sheetalkumari, Shreeranjani, Balachandar, A Singh, M. singh, R Ratan, S. Kumar, Optical character recognition for printed tamil text using unicode, Journal of Zhejang University SCI, 64(11), (2005), 1297-1305
Index Terms

Computer Science
Information Sciences

Keywords

Segmentation Radon transform skew correction digitization noise removal