CFP last date
22 April 2024
Reseach Article

Line and Word Segmentation Approach for Printed Documents

Published on None 2010 by Nallapareddy Priyanka, Srikanta Pal, Ranju Manda
Recent Trends in Image Processing and Pattern Recognition
Foundation of Computer Science USA
RTIPPR - Number 1
None 2010
Authors: Nallapareddy Priyanka, Srikanta Pal, Ranju Manda
29aa59d4-1077-41bf-b464-68ec8b444366

Nallapareddy Priyanka, Srikanta Pal, Ranju Manda . Line and Word Segmentation Approach for Printed Documents. Recent Trends in Image Processing and Pattern Recognition. RTIPPR, 1 (None 2010), 30-36.

@article{
author = { Nallapareddy Priyanka, Srikanta Pal, Ranju Manda },
title = { Line and Word Segmentation Approach for Printed Documents },
journal = { Recent Trends in Image Processing and Pattern Recognition },
issue_date = { None 2010 },
volume = { RTIPPR },
number = { 1 },
month = { None },
year = { 2010 },
issn = 0975-8887,
pages = { 30-36 },
numpages = 7,
url = { /specialissues/rtippr/number1/973-96/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Special Issue Article
%1 Recent Trends in Image Processing and Pattern Recognition
%A Nallapareddy Priyanka
%A Srikanta Pal
%A Ranju Manda
%T Line and Word Segmentation Approach for Printed Documents
%J Recent Trends in Image Processing and Pattern Recognition
%@ 0975-8887
%V RTIPPR
%N 1
%P 30-36
%D 2010
%I International Journal of Computer Applications
Abstract

Line and word segmentation is one of the important step of OCR systems. In this paper we have proposed a robust method for segmentation of individual text lines based on the modified histogram obtained from run length based smearing. A complete line and word segmentation system for some popular Indian printed languages is presented here. Both foreground and background information are used here for accurate line segmentation. There may be some touching or overlapping characters between two consecutive text lines and most of the line segmentation errors are generated due to touching and overlapping character occurrences. Sometimes, interline space and noises make line segmentation a difficult task. Our method can take care of this situation accurately. Word segmentation from individual lines is also discussed here. We have tested our method on documents of Bangla, Devnagari, Kannada, Telugu scripts as well as some multi-script documents and we have obtained encouraging results from our proposed technique.

References
  1. U. Pal and B.B. Chaudhuri, “Indian script character recognition: A Survey”, Pattern Recognition, vol. 37, pp. 1887-1899, 2004.
  2. B. B. Chaudhuri and U. Pal, “A complete printed Bangla OCR system”, Pattern Recognition, vol.31, pp.531-549, 1998.
  3. K. Wong, R. Casey and F. Wahl “Document Analysis System “, IBM j.Res . Dev., 26(6), pp.647-656, 1982.
  4. Likforman-Sulem, L., Zahour, A. and Taconet, B., “Text line Segmentation of Historical Documents: a Survey”, International Journal on Document Analysis and Recognition, Springer, Vol. 9, Issue 2, pp.123-138, 2007.
  5. F. Hones and J. Litcher, “Layout extraction of mixed mode documents”, Machine Vision Application, vol. 7, pp. 237–246, 1994.
  6. K. Kise, W. Iwata, and K. Matsumoto, “A computational geometric approach to text line extraction from binary document images”, in Proc. IAPR Workshop Document Analysis Systems, pp. 364-375, 1998.
  7. D. S. Le, G. R. Thoma, and H. Wechsler, “Automatic page orientation and skew angle detection for binary document images”, Pattern Recognition, vol. 27, pp. 1325-1344, 1994.
  8. G. Nagy, S. Seth, and M. Viswanathan, “A prototype document image analysis system for technical journals”, Computer, vol. 25, pp. 10-22, 1992.
  9. L. O’Gorman, “The document spectrum for page layout analysis”, IEEE Trans. Pattern Anal. Mach. Intell., vol. 15, pp. 1162–1173, 1993.
  10. U. Pal, M. Mitra, and B. B. Chaudhuri, “Multi-skew detection of Indian script documents”, in Proc. 6th Int. Conf. Document Analysis Recognition, pp. 292-296, 2001.
  11. H. Yan, “Skew correction of document images using interline cross-correlation”, CVGIP: Graph. Models Image Process, vol. 55, pp. 538–543, 1993.
  12. G. Magy, Twenty years of Document Analysis in PAMI, IEEE Trans. In PAMI, Vol.22, pp. 38-61, 2000.
  13. Vijay Kumar, Pankaj K.Senegar, ”Segmentation of Printed Text in Devnagari Script and Gurmukhi Script ”, IJCA: International Journal of Computer Applications, Vol.3,pp. 24-29, 2010.
  14. M.K. Jindal, R.K. Sharma and G.S. Lehal,"Segmentation of Horizontally overlapping Lines in Printed Indian Scripts",International Journal of Computational Intelligence Research,vol-3, pp.277-286, 2007.
  15. U. Pal and Sagarika Datta, "Segmentation of Bangla Unconstrained Handwritten Text", Proc. 7th Int. Conf. on Document Analysis and Recognition, pp.1128-1132, 2003.
  16. U. Pal and P. P. Roy, "Multi-oriented and curved text lines extraction from Indian documents", IEEE Trans. On Systems, Man and Cybernetics- Part B, vol.34, pp.1676-1684, 2004.
Index Terms

Computer Science
Information Sciences

Keywords

Line segmentation Word segmentation Histogram Indian documents