Character Level Separation and Identification of English and Gujarati Digits from Bilingual (English-Gujarati) Printed Documents

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 22 June 2026

Submit your paper

Know more

The week's pick

CAD-Genesis: An Open-Source AI-Powered Add-in for Natural Language-Driven Parametric CAD Modeling and Cross-Platform Integration in SolidWorks and Fusion 360

Anil Mandloi Prakhi Mandloi

Random Articles

Generating Weather Forecast Texts with Case based Reasoning

May

2012

A Review on Mobility and Mobility Aware MAC Protocols in Wireless Sensor Network

April

2014

Listless Block Tree Coding with Discrete Wavelet Transform for Embedded Image Compression at Low Bit Rate

May

2013

Extracting Market Value of Business and Business Decision from Big Data Analytics

January

2016

Reseach Article

Character Level Separation and Identification of English and Gujarati Digits from Bilingual (English-Gujarati) Printed Documents

Published on March 2012 by Shailesh A. Chaudhari, Ravi M. Gulati

International Conference in Computational Intelligence

Foundation of Computer Science USA

ICCIA - Number 3

March 2012

Authors: Shailesh A. Chaudhari, Ravi M. Gulati

Shailesh A. Chaudhari, Ravi M. Gulati . Character Level Separation and Identification of English and Gujarati Digits from Bilingual (English-Gujarati) Printed Documents. International Conference in Computational Intelligence. ICCIA, 3 (March 2012), 9-13.

@article{

author = { Shailesh A. Chaudhari, Ravi M. Gulati },

title = { Character Level Separation and Identification of English and Gujarati Digits from Bilingual (English-Gujarati) Printed Documents },

journal = { International Conference in Computational Intelligence },

issue_date = { March 2012 },

volume = { ICCIA },

number = { 3 },

month = { March },

year = { 2012 },

issn = 0975-8887,

pages = { 9-13 },

numpages = 5,

url = { /proceedings/iccia/number3/5109-1021/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 International Conference in Computational Intelligence

%A Shailesh A. Chaudhari

%A Ravi M. Gulati

%T Character Level Separation and Identification of English and Gujarati Digits from Bilingual (English-Gujarati) Printed Documents

%J International Conference in Computational Intelligence

%@ 0975-8887

%V ICCIA

%N 3

%P 9-13

%D 2012

%I International Journal of Computer Applications

Abstract

Nowadays, it is observed that English script has interspersed within the Indian languages. So there is a need for an optical character recognition (OCR) system which can recognize these bilingual documents and store it for future use. Hence, in this paper an OCR system is proposed that can read documents containing Gujarati and English scripts (Only digits). These scripts have many features in common and hence a single system can be modelled to recognize them. Here, we have used template matching classifier. The normalized feature vector is used as a feature to classify English and Gujarati digits. The system shows a good performance for multi-font, size independent printed bilingual English- Gujarati digits. An average classification rate 98.30% is obtained for Gujarati digits and 98.88% is obtained for English digits at character level.

References

U. Pal and B. B. Chaudhuri, 1999, “Script line separation from Indian multi-script documents”, In Proc. Int. Conf. Document Analysis and Recognition (ICDAR).
U. Pal and B.B.Chaudhuri, 2001, “Automatic identification of English, Chinese, Arabic, Devanagari and Bangla script line”, In Sixth International Conference on Document Analysis and Recognition (ICDAR '01).
A. L. Spitz, 1997, "Determination of the Script and Language content of Document Images", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No. 3.
T. N. Tan, 1998, "Rotation Invariant Texture Features and their use in Automatic script Identification", IEEE Transactions on PAMI, Vol.20, No.7.
B. B. Chaudhuri and U. Pal, 1999, “Automatic separation of machine printed and handwritten text lines", 5th Ineternational Conference on Document Analysis and Recognition, Vol.1.

Index Terms

Computer Science

Information Sciences

Keywords

Segmentation Normalization Vector Template Correlation