Extracting Text from Telugu Color Documents by Removing Dither Patterns

Siva Rama Sastry Gumma

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

Extracting Text from Telugu Color Documents by Removing Dither Patterns

by Siva Rama Sastry Gumma

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 181 - Number 48

Year of Publication: 2019

Authors: Siva Rama Sastry Gumma

10.5120/ijca2019918653

Siva Rama Sastry Gumma . Extracting Text from Telugu Color Documents by Removing Dither Patterns. International Journal of Computer Applications. 181, 48 ( Apr 2019), 1-7. DOI=10.5120/ijca2019918653

@article{ 10.5120/ijca2019918653,

author = { Siva Rama Sastry Gumma },

title = { Extracting Text from Telugu Color Documents by Removing Dither Patterns },

journal = { International Journal of Computer Applications },

issue_date = { Apr 2019 },

volume = { 181 },

number = { 48 },

month = { Apr },

year = { 2019 },

issn = { 0975-8887 },

pages = { 1-7 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume181/number48/30476-2019918653/ },

doi = { 10.5120/ijca2019918653 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T01:09:26.331599+05:30

%A Siva Rama Sastry Gumma

%T Extracting Text from Telugu Color Documents by Removing Dither Patterns

%J International Journal of Computer Applications

%@ 0975-8887

%V 181

%N 48

%P 1-7

%D 2019

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Preprocessing is an important step in the development of Optical Character Recognition (OCR) system. Inpreprocessing there are various modules like binarization, skew detection and correction etc. Among these modules this paper discusses about binarization module. Although there are many algorithms for binarization of a document image, there are fewer algorithms for binarization of printed color images because of printed color documents contain dither patterns, normal text, reversed text, colored text overlayed on colored background drawings and graphics appear with millions of different colors. Hence preprocessing for colored documents is a challenging task to work.For printed color documents, elimination of dither patterns using Butterworth band reject filter and text extraction in the color documents by eliminating graphics using height of the component is also presented. Results on a corpus consisting of newspapers published in Telugu show that the proposed method shows promising results.

References

Nobuyuki Otsu A Threshold Selection Method from Gray- Level Histograms IEEE Transactions on Systems and Cybernetics (1979), Vol. SMC-9 No. 1, January 1979.
W.Niblack An Introduction to Digital Image Processing. 1986.
J.Sauvola,T,Seppanen, S.Haapakoski and M.Pietikainen Adaptive Document Binarization ICDAR’97 4th Int. Conf. On Document Analysis and Recognition pages.147- 152.
C.Strouthopoulos, N. Papamarkos and A.E. Atsalakis Text Extraction in Complex Color Documents Pattern Recognition (2002) , pages 1743-1758, 2002.
Efthimios Badekas , Nikos Nikolaou, Nikos Papamarkos Text Binarization in Color Documents Wiley Periodicals, Inc, Vol. 16, 262274 (2007)
Dennis F. Dunn and Niloufer E. Mathew Extracting color halftones from printed documents using texture analysis Pattern Recognition 33 (2000) , pages 445-463, 2002.
Chun-Ming Tsai Intelligent region-based thresholding for color document images with highlighted regions Pattern Recognition 45 (2012) pages 1341-1362
D.F. Dunn, T.P. Weldon, W.E. Higgins Extracting halftones from printed documents using texture analysis Opt. Eng 36 (4) (1997) pages 1044-1052.

Index Terms

Computer Science

Information Sciences

Keywords

OCR threshold binarization dither patterns connected components