CFP last date
22 April 2024
Reseach Article

Text Line Segmentation of Handwritten Documents using Clustering Method based on Thresholding Approach

Published on August 2012 by M. Ravi Kumar, Nayana N Shetty, B. P. Pragath
National Conference on Advanced Computing and Communications 2012
Foundation of Computer Science USA
NCACC - Number 1
August 2012
Authors: M. Ravi Kumar, Nayana N Shetty, B. P. Pragath
42726fa1-ef7c-4f0f-a621-c251583adcca

M. Ravi Kumar, Nayana N Shetty, B. P. Pragath . Text Line Segmentation of Handwritten Documents using Clustering Method based on Thresholding Approach. National Conference on Advanced Computing and Communications 2012. NCACC, 1 (August 2012), 9-12.

@article{
author = { M. Ravi Kumar, Nayana N Shetty, B. P. Pragath },
title = { Text Line Segmentation of Handwritten Documents using Clustering Method based on Thresholding Approach },
journal = { National Conference on Advanced Computing and Communications 2012 },
issue_date = { August 2012 },
volume = { NCACC },
number = { 1 },
month = { August },
year = { 2012 },
issn = 0975-8887,
pages = { 9-12 },
numpages = 4,
url = { /proceedings/ncacc/number1/7989-1005/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 National Conference on Advanced Computing and Communications 2012
%A M. Ravi Kumar
%A Nayana N Shetty
%A B. P. Pragath
%T Text Line Segmentation of Handwritten Documents using Clustering Method based on Thresholding Approach
%J National Conference on Advanced Computing and Communications 2012
%@ 0975-8887
%V NCACC
%N 1
%P 9-12
%D 2012
%I International Journal of Computer Applications
Abstract

Segmentation of the text lines in an un-constrained handwritten documents still a challenging task because handwritten text lines are often un-uniformly skewed and curved, and the space between lines is not obvious. In this paper, we propose a text-line segmentation algorithm based on clustering using threshold. The connected components of document image are grouped, from which text-lines are extracted dynamically by coloring all the text-lines.

References
  1. Downton A. , Leedham C. G. (1990), Preprocessing and presorting of envelope images for automatic sorting using OCR, Pattern Recognition, 23(3-4):347-362.
  2. Govindaraju V. , R. Srihari, S. Srihari (1994), Handwritten text recognition, Document Analysis Systems DAS
  3. Seni G. , Cohen E. (1994), External word segmentation of off-line handwritten text line, pattern Recognition, 27, Issue 1, January, pp 41-52
  4. Srihari S. , Kim G. (1997), Penman: a system for reading unconstrained handwritten page image, SDIUT 97, Symposium on document image understanding technology, pp. 142-153.
  5. Zhang B. , Srihari S. N. , Huang C. (2004), Word image retrieval using binary features, SPIE Conference on Document Recognition and retrieval XI, San Jose,California, USA, Jan 18-22. 2. Antonacopoulos A. (1994), Flexible Page Segmentation Using the Background, Proc. 12th Int. Conf. on Pattern Recognition (12th ICPR), Jerusalem, Israel, October 9-12, vol. 2, pp. 339-344.
  6. Marti U. , Bunke H. (1999), A full English sentence database for off-line handwriting recognition, Proc. 5th
  7. F. Yin, C. L. Liu,(2007), Handwritten text line extraction based on minimal spanning tree clustering, Proc. 5th Int. Conf. on Wavelet Analysis and Pattern Recognition, Vol. 3, pp. 1123-1128.
  8. F. Chang,C. J. Chen,C. J. Lu,A linear-time component labeling algorithm using contour tracing technique, Computer Vision and Image Understanding, Vol. 93, pp. 206-220, 2004.
  9. Fei Yin, Cheng-Lin Liu, Handwritten Text Line Segmentation by Clustering with Distance Metric Learning, National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences
  10. G. Nagy, S. Seth, M. Viswanathan,(1992), A prototype document image analysis system for technical journals, Computer, Vol. 25, pp. 10-22.
  11. U. Pal, S. Datta,(2003), Segmentation of Bangla unconstrained handwritten text, Proc. 7th Int. Conf. on Document Analysis and Recognition, Vol. 2, pp. 1128- 1132.
  12. A. Zahour, B. Taconet, P. Mercy,S. Ramdane,(2001),Arabic handwritten text-line extraction, Proc 6th Int. Conf. on Document Analysis and Recognition, pp. 281-285.
  13. Z. Shi, S. Setlur, V. Govindaraju, (2005),Text extraction from gray scale historical document image using adaptive local connectivity map, Proc. 8th Int. Conf. on Document Analysis and Recognition, Vol. 2, pp. 794-798.
  14. D. J. Kennard, W. A. Barrett,(2006), Separating lines of text in freeform handwritten historical documents, Proc. 2nd Int. Conf. on Document Image Analysis for Libraries, pp. 12-23.
  15. Y. Li, Y. Zheng, D. Doermann, S. Jaeger,(2008), Script independent text line segmentation in freestyle handwritten document, IEEE Trans. Pattern Analysis and Machine Intelligence, to appear.
  16. L O'Gorman, (1993),The document spectrum for page layout analysis, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 15, No. 11, pp. 1162-1173.
  17. L. Likforman-Sulem,(1994), C. Faure, Extracting lines on handwritten document by perceptual grouping,In: Advances in Handwriting and Drawing: A Multidisciplinary Approach, pp . 21-38.
  18. I. S. I. Abuhaiba,S. Datta,(1995),M. J. J. Holt, Line extraction and stroke ordering of text pages, Proc. 3rd Int. Conf. on Document Analysis and Recognition, Vol. 1, pp. 390-393.
  19. A. Simon, J. -C. Pret , A. P. Johnson,(1997), A fast algorithm for bottom-up document layout analysis, IEEE Trans. Pattern Analysis and Machine Intelligence,Vol. 19, No. 3, pp. 273-277.
  20. Y. Pu,Z. Shi,(1998), A natural learning algorithm based on Hough transform for text lines extraction in handwritten document, Proc. 6th Int. Workshop on Frontiers in Handwriting
Index Terms

Computer Science
Information Sciences

Keywords

Handwritten Document Segmentation Bounding Box Threshold Clustering