Line and Word Segmentation Approach for Printed Documents

Call for Paper

June Edition

IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2026

Submit your paper

Know more

The week's pick

A Hybrid Collaborative Clustering Approach for Noise-Robust Speech Recognition

Ameni Filali

Random Articles

Fortification of Transport Layer Security Protocol by using Password and Fingerprint as Identity Authentication Parameters

March

2012

Variational Iteration Method for Solving Two Dimensional Volterra - Fredholm Nonlinear Integral Equations

Oct

2016

Connected Perfect Domination of Interval-Valued Fuzzy Graphs

Jul

2022

Chaotic Harmony Search Algorithm with Different Chaotic Maps for Solving Assignment Problems

January

2014

Reseach Article

Line and Word Segmentation Approach for Printed Documents

Published on None 2010 by Nallapareddy Priyanka, Srikanta Pal, Ranju Manda

Recent Trends in Image Processing and Pattern Recognition

Foundation of Computer Science USA

RTIPPR - Number 1

None 2010

Authors: Nallapareddy Priyanka, Srikanta Pal, Ranju Manda

29aa59d4-1077-41bf-b464-68ec8b444366

Nallapareddy Priyanka, Srikanta Pal, Ranju Manda . Line and Word Segmentation Approach for Printed Documents. Recent Trends in Image Processing and Pattern Recognition. RTIPPR, 1 (None 2010), 30-36.

@article{

author = { Nallapareddy Priyanka, Srikanta Pal, Ranju Manda },

title = { Line and Word Segmentation Approach for Printed Documents },

journal = { Recent Trends in Image Processing and Pattern Recognition },

issue_date = { None 2010 },

volume = { RTIPPR },

number = { 1 },

month = { None },

year = { 2010 },

issn = 0975-8887,

pages = { 30-36 },

numpages = 7,

url = { /specialissues/rtippr/number1/973-96/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Special Issue Article

%1 Recent Trends in Image Processing and Pattern Recognition

%A Nallapareddy Priyanka

%A Srikanta Pal

%A Ranju Manda

%T Line and Word Segmentation Approach for Printed Documents

%J Recent Trends in Image Processing and Pattern Recognition

%@ 0975-8887

%V RTIPPR

%N 1

%P 30-36

%D 2010

%I International Journal of Computer Applications

Abstract

Line and word segmentation is one of the important step of OCR systems. In this paper we have proposed a robust method for segmentation of individual text lines based on the modified histogram obtained from run length based smearing. A complete line and word segmentation system for some popular Indian printed languages is presented here. Both foreground and background information are used here for accurate line segmentation. There may be some touching or overlapping characters between two consecutive text lines and most of the line segmentation errors are generated due to touching and overlapping character occurrences. Sometimes, interline space and noises make line segmentation a difficult task. Our method can take care of this situation accurately. Word segmentation from individual lines is also discussed here. We have tested our method on documents of Bangla, Devnagari, Kannada, Telugu scripts as well as some multi-script documents and we have obtained encouraging results from our proposed technique.

References

U. Pal and B.B. Chaudhuri, “Indian script character recognition: A Survey”, Pattern Recognition, vol. 37, pp. 1887-1899, 2004.
B. B. Chaudhuri and U. Pal, “A complete printed Bangla OCR system”, Pattern Recognition, vol.31, pp.531-549, 1998.
K. Wong, R. Casey and F. Wahl “Document Analysis System “, IBM j.Res . Dev., 26(6), pp.647-656, 1982.
Likforman-Sulem, L., Zahour, A. and Taconet, B., “Text line Segmentation of Historical Documents: a Survey”, International Journal on Document Analysis and Recognition, Springer, Vol. 9, Issue 2, pp.123-138, 2007.
F. Hones and J. Litcher, “Layout extraction of mixed mode documents”, Machine Vision Application, vol. 7, pp. 237–246, 1994.
K. Kise, W. Iwata, and K. Matsumoto, “A computational geometric approach to text line extraction from binary document images”, in Proc. IAPR Workshop Document Analysis Systems, pp. 364-375, 1998.
D. S. Le, G. R. Thoma, and H. Wechsler, “Automatic page orientation and skew angle detection for binary document images”, Pattern Recognition, vol. 27, pp. 1325-1344, 1994.
G. Nagy, S. Seth, and M. Viswanathan, “A prototype document image analysis system for technical journals”, Computer, vol. 25, pp. 10-22, 1992.
L. O’Gorman, “The document spectrum for page layout analysis”, IEEE Trans. Pattern Anal. Mach. Intell., vol. 15, pp. 1162–1173, 1993.
U. Pal, M. Mitra, and B. B. Chaudhuri, “Multi-skew detection of Indian script documents”, in Proc. 6th Int. Conf. Document Analysis Recognition, pp. 292-296, 2001.
H. Yan, “Skew correction of document images using interline cross-correlation”, CVGIP: Graph. Models Image Process, vol. 55, pp. 538–543, 1993.
G. Magy, Twenty years of Document Analysis in PAMI, IEEE Trans. In PAMI, Vol.22, pp. 38-61, 2000.
Vijay Kumar, Pankaj K.Senegar, ”Segmentation of Printed Text in Devnagari Script and Gurmukhi Script ”, IJCA: International Journal of Computer Applications, Vol.3,pp. 24-29, 2010.
M.K. Jindal, R.K. Sharma and G.S. Lehal,"Segmentation of Horizontally overlapping Lines in Printed Indian Scripts",International Journal of Computational Intelligence Research,vol-3, pp.277-286, 2007.
U. Pal and Sagarika Datta, "Segmentation of Bangla Unconstrained Handwritten Text", Proc. 7th Int. Conf. on Document Analysis and Recognition, pp.1128-1132, 2003.
U. Pal and P. P. Roy, "Multi-oriented and curved text lines extraction from Indian documents", IEEE Trans. On Systems, Man and Cybernetics- Part B, vol.34, pp.1676-1684, 2004.

Index Terms

Computer Science

Information Sciences

Keywords

Line segmentation Word segmentation Histogram Indian documents