Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

Character Level Separation and Identification of English and Gujarati Digits from Bilingual (English-Gujarati) Printed Documents

Print
PDF
IJCA Proceedings on International Conference in Computational Intelligence (ICCIA2012)
© 2012 by IJCA Journal
iccia - Number 3
Year of Publication: 2012
Authors:
Shailesh A. Chaudhari
Ravi M. Gulati

Shailesh A Chaudhari and Ravi M Gulati. Article: Character Level Separation and Identification of English and Gujarati Digits from Bilingual (English-Gujarati) Printed Documents. IJCA Proceedings on International Conference in Computational Intelligence (ICCIA 2012) ICCIA(3):9-13, March 2012. Full text available. BibTeX

@article{key:article,
	author = {Shailesh A. Chaudhari and Ravi M. Gulati},
	title = {Article: Character Level Separation and Identification of English and Gujarati Digits from Bilingual (English-Gujarati) Printed Documents},
	journal = {IJCA Proceedings on International Conference in Computational Intelligence (ICCIA 2012)},
	year = {2012},
	volume = {ICCIA},
	number = {3},
	pages = {9-13},
	month = {March},
	note = {Full text available}
}

Abstract

Nowadays, it is observed that English script has interspersed within the Indian languages. So there is a need for an optical character recognition (OCR) system which can recognize these bilingual documents and store it for future use. Hence, in this paper an OCR system is proposed that can read documents containing Gujarati and English scripts (Only digits). These scripts have many features in common and hence a single system can be modelled to recognize them. Here, we have used template matching classifier. The normalized feature vector is used as a feature to classify English and Gujarati digits. The system shows a good performance for multi-font, size independent printed bilingual English- Gujarati digits. An average classification rate 98.30% is obtained for Gujarati digits and 98.88% is obtained for English digits at character level.

References

  • U. Pal and B. B. Chaudhuri, 1999, “Script line separation from Indian multi-script documents”, In Proc. Int. Conf. Document Analysis and Recognition (ICDAR).
  • U. Pal and B.B.Chaudhuri, 2001, “Automatic identification of English, Chinese, Arabic, Devanagari and Bangla script line”, In Sixth International Conference on Document Analysis and Recognition (ICDAR '01).
  • A. L. Spitz, 1997, "Determination of the Script and Language content of Document Images", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No. 3.
  • T. N. Tan, 1998, "Rotation Invariant Texture Features and their use in Automatic script Identification", IEEE Transactions on PAMI, Vol.20, No.7.
  • B. B. Chaudhuri and U. Pal, 1999, “Automatic separation of machine printed and handwritten text lines", 5th Ineternational Conference on Document Analysis and Recognition, Vol.1.