CFP last date
20 May 2024
Reseach Article

Line-wise Script Segmentation for Indian Language Documents

by Manoj Kumar Shukla, Haider Banka
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 108 - Number 9
Year of Publication: 2014
Authors: Manoj Kumar Shukla, Haider Banka
10.5120/18943-0411

Manoj Kumar Shukla, Haider Banka . Line-wise Script Segmentation for Indian Language Documents. International Journal of Computer Applications. 108, 9 ( December 2014), 34-37. DOI=10.5120/18943-0411

@article{ 10.5120/18943-0411,
author = { Manoj Kumar Shukla, Haider Banka },
title = { Line-wise Script Segmentation for Indian Language Documents },
journal = { International Journal of Computer Applications },
issue_date = { December 2014 },
volume = { 108 },
number = { 9 },
month = { December },
year = { 2014 },
issn = { 0975-8887 },
pages = { 34-37 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume108/number9/18943-0411/ },
doi = { 10.5120/18943-0411 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:42:34.935054+05:30
%A Manoj Kumar Shukla
%A Haider Banka
%T Line-wise Script Segmentation for Indian Language Documents
%J International Journal of Computer Applications
%@ 0975-8887
%V 108
%N 9
%P 34-37
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In a multi-lingual country like India, script segmentation or script separation of the multi-script in an image of a document page is of primary importance for a script identification system. For script segmentation of such a document page, it is necessary to segment multi script forms before running individual OCR of the script. In this paper we present a technique for script segmentation of the individual text line for printed in Indian language document. Our line wise script segmentation approach is Horizontal Projection Profile based script segmentation. A prototype of the system has been tested on printed Indian language lines of script and an average accuracy of 99% has been achieved

References
  1. U. Pal and B. Chaudhuri. Script line separation from indian multi-script documents. In International Conference on Document Analysis and Recognition, pages 406{409, 1999.
  2. U. Pal and B. Chaudhuri. Automatic identi_cation of english, chinese, arabic, devnagari and bangla script line. In International Conference on Document Analysis and Recognition, pages 790{794, 2001.
  3. U. Pal, S. Sinha and B. B. Chaudhuri, "Multi-Script line identification from Indian documents," Proc. of seventh Intl. conf. on document analysis and Recognition (ICDAR 2003), vol. 2, pp. 880-884, 2003.
  4. Santanu Choudhury, Gaurav Harit, Shekar Madnani, R. B. Shet, "Identification of Scripts of Indian Languages by Combining Trainable Classifiers," ICVGIP, Bangalore, India, Dec. 20-22, 2000.
  5. S Basavaraj Patil and N. V. SubbaReddy, "Neural network based system for script identification in Indian documents," Sadhana, vol. 27, part1, pp. 83-97, February 2002.
  6. B. V. Dhandra, Mallikarjun Hangarge, Ravindra Hegadi and V. S. Malemath, "Word Level Script Identification in Bilingual Documents through Discriminating Features," IEEE – ICSCN 2007, Chennai, India, pp. 630-635, Feb. 2007.
  7. S. Chanda, U. Pal, "English, Devanagari and Urdu Text Identification," Proc. Intl. Conf. on Document Analysis and Recognition, pp. 538-545, 2005.
  8. P. A. Vijaya, M. C. Padma, "Text line identification from a multilingual document," Proc. of Intl. Conf. on digital image processing (ICDIP 2009) Bangkok, pp. 302-305, March 2009.
  9. Gopal Datt Joshi, Saurabh Garg and Jayanthi Sivaswamy, "Script Identification from Indian Documents," LNCS 3872, DAS, pp. 255-267, 2006.
  10. Zhou L, Y Lu and C L Tan, Bangla/English script Identification based on analysis of connected component Profiles, In Proc. 7th DAS, 2006
  11. S. Tsujimoto and H. Asada, 1992, "Major components of a complete text reading system", Proceedings of the IEEE, Vol. 80(7), pp. 1133-1149, 1992.
  12. V. Bansal and R. M. K. Sinha, "Segmentation of touching and fused Devanagari characters", Pattern Recognition, Vol. 35(4), pp. 875-893, 2002.
  13. U. Pal and B. B. Chaudhuri, "Printed Devanagari script OCR system", Vivek, Vol. 10(1), pp. 12-24, 1997.
  14. B. B. Chaudhuri and U. Pal, "A complete printed Bangla OCR system", Pattern Recognition, Vol. 31(5), pp. 531-549, 1998.
  15. G. S. Lehal, C. Singh and R. Lehal, "A shape based post processor for Gurmukhi OCR", in the Proceedings of 6th ICDAR, pp. 1105-1109, 2001.
  16. A. Goyal, G. S. Lehal and S. S. Deol, "Segmentation of machine printed Gurmukhi script", in the Proceedings of 9th International Graphonomics Society Conference,Singapore, pp. 293-297, 1999.
  17. G. S. Lehal, Optical Character Recognition of Machine Printed Gurmukhi Text, Ph. D. hesis, Punjabi University, Patiala, India, 2001.
Index Terms

Computer Science
Information Sciences

Keywords

Script line documents OCR