CFP last date
20 May 2024
Reseach Article

Text line and word segmentation of Indian Script Handwritten Document

Published on March 2012 by Varsha Hole, Leena Ragha, Pravin Hole
International Conference and Workshop on Emerging Trends in Technology
Foundation of Computer Science USA
ICWET2012 - Number 3
March 2012
Authors: Varsha Hole, Leena Ragha, Pravin Hole
8409f140-ff85-439d-b356-95e9d8230a49

Varsha Hole, Leena Ragha, Pravin Hole . Text line and word segmentation of Indian Script Handwritten Document. International Conference and Workshop on Emerging Trends in Technology. ICWET2012, 3 (March 2012), 25-32.

@article{
author = { Varsha Hole, Leena Ragha, Pravin Hole },
title = { Text line and word segmentation of Indian Script Handwritten Document },
journal = { International Conference and Workshop on Emerging Trends in Technology },
issue_date = { March 2012 },
volume = { ICWET2012 },
number = { 3 },
month = { March },
year = { 2012 },
issn = 0975-8887,
pages = { 25-32 },
numpages = 8,
url = { /proceedings/icwet2012/number3/5330-1021/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference and Workshop on Emerging Trends in Technology
%A Varsha Hole
%A Leena Ragha
%A Pravin Hole
%T Text line and word segmentation of Indian Script Handwritten Document
%J International Conference and Workshop on Emerging Trends in Technology
%@ 0975-8887
%V ICWET2012
%N 3
%P 25-32
%D 2012
%I International Journal of Computer Applications
Abstract

Based on the analysis of Indian script character shapes and literature survey, it presents a new sequence of line and word segmentation method to handle some of the deformations usually present in the handwritten document like touching components, overlapping components, skewed lines, words with individual skews etc. and build a proper text image with all these deformations removed. Line segmentation procedure is applied using Hough transform. The word segmentation is done with the computation of the distances of adjacent components in the text line image and classification of the previously computed distances as either inter-word gaps or inter-character gaps in a Gaussian mixture modeling framework. The proposed method of line segmentation is a sufficiently accurate to extract the text lines from unconstrained handwritten text documents. Word segmentation procedure also works well on different language scripts. Average result of word segmentation for complex Document on different language script is 76% and average result of word segmentation for good Document of different language script is 90%.

References
  1. A. Nicolaou, and B. Gatos “Handwritten Text Line Segmentation by Shredding Text into its Lines”, 10th International Conference on Document Analysis and Recognition, IEEE Computer society, 2009, 626-630.
  2. Bidyut B. Chaudhuri, Sumedha Bera, “Handwritten Text Line Identification In Indian Scripts”, 10th International Conference on Document Analysis and Recognition, 2009,636-640. DOI= http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05277570
  3. Bikash Shaw,Swapan Kumar Parui, Malayappan Shridha, “A Segmentation Based Approach to Offline Handwritten Devanagari Word Recognition”, International Conference on InformationTechnology,2008,256-257.DOI= http://doi.ieeecomputersociety.org/10.1109/ICIT.2008.32
  4. Bruzzone, E., Coffetti, M.C. (1999), An algorithm for extracting cursive text lines;, 1999. Proceedings of ICDAR '99, 20-22Sept.,749–752.DOI= http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=791896
  5. C. Huang, S. Srihari, “Word segmentation of off-line handwritten documents”, in: Proceedings of the Document Recognition and Retrieval (DRR) XV, IST/SPIE Annual Symposium, San Jose, CA, USA, January 2008.
  6. Fajri Kurniawan , Amjad Rehman Khan, Dzulkifli Mohamad, “ Contour vs Non-Contour based Word Segmentation from Handwritten Text Lines: an Experimental Analysis” International Journal of Digital Content Technology and its Applications Volume3,Number2,June2009,127-131.DOI= http://www.aicit.org/jdcta/ppl/jdcta_version10_Part17.pdf
  7. Satadal Saha, Subhadip Basu, Mita Nasipuri and Dipak Kr. Basu, “A Hough Transform based Technique for Text Segmentation”, journal of computing, volume 2, issue 2, February 2010,134-14.
  8. G. Louloudisa, B.Gatosb,I.Pratikakisb, C.Halatsisa, “ Text line and word segmentation of handwritten documents”, Pattern Recognition42,2009,3169–3183.DOI= http://users.iit.demokritos.gr/~bgat/Louloud_1_2009.pdf
  9. J.M. Marin, K. Mengersen, C.P. Robert, Bayesian Modelling and Inference on Mixtures of Distributions, Handbook of Statistics, vol. 25, Elsevier-Sciences, Amsterdam, 2005.
Index Terms

Computer Science
Information Sciences

Keywords

Optical character recognition Pre-processing Global skew detection and correction Line segmentation Word segmentation