CFP last date
20 May 2024
Reseach Article

Extracting Text from Telugu Color Documents by Removing Dither Patterns

by Siva Rama Sastry Gumma
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 181 - Number 48
Year of Publication: 2019
Authors: Siva Rama Sastry Gumma
10.5120/ijca2019918653

Siva Rama Sastry Gumma . Extracting Text from Telugu Color Documents by Removing Dither Patterns. International Journal of Computer Applications. 181, 48 ( Apr 2019), 1-7. DOI=10.5120/ijca2019918653

@article{ 10.5120/ijca2019918653,
author = { Siva Rama Sastry Gumma },
title = { Extracting Text from Telugu Color Documents by Removing Dither Patterns },
journal = { International Journal of Computer Applications },
issue_date = { Apr 2019 },
volume = { 181 },
number = { 48 },
month = { Apr },
year = { 2019 },
issn = { 0975-8887 },
pages = { 1-7 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume181/number48/30476-2019918653/ },
doi = { 10.5120/ijca2019918653 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:09:26.331599+05:30
%A Siva Rama Sastry Gumma
%T Extracting Text from Telugu Color Documents by Removing Dither Patterns
%J International Journal of Computer Applications
%@ 0975-8887
%V 181
%N 48
%P 1-7
%D 2019
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Preprocessing is an important step in the development of Optical Character Recognition (OCR) system. Inpreprocessing there are various modules like binarization, skew detection and correction etc. Among these modules this paper discusses about binarization module. Although there are many algorithms for binarization of a document image, there are fewer algorithms for binarization of printed color images because of printed color documents contain dither patterns, normal text, reversed text, colored text overlayed on colored background drawings and graphics appear with millions of different colors. Hence preprocessing for colored documents is a challenging task to work.For printed color documents, elimination of dither patterns using Butterworth band reject filter and text extraction in the color documents by eliminating graphics using height of the component is also presented. Results on a corpus consisting of newspapers published in Telugu show that the proposed method shows promising results.

References
  1. Nobuyuki Otsu A Threshold Selection Method from Gray- Level Histograms IEEE Transactions on Systems and Cybernetics (1979), Vol. SMC-9 No. 1, January 1979.
  2. W.Niblack An Introduction to Digital Image Processing. 1986.
  3. J.Sauvola,T,Seppanen, S.Haapakoski and M.Pietikainen Adaptive Document Binarization ICDAR’97 4th Int. Conf. On Document Analysis and Recognition pages.147- 152.
  4. C.Strouthopoulos, N. Papamarkos and A.E. Atsalakis Text Extraction in Complex Color Documents Pattern Recognition (2002) , pages 1743-1758, 2002.
  5. Efthimios Badekas , Nikos Nikolaou, Nikos Papamarkos Text Binarization in Color Documents Wiley Periodicals, Inc, Vol. 16, 262274 (2007)
  6. Dennis F. Dunn and Niloufer E. Mathew Extracting color halftones from printed documents using texture analysis Pattern Recognition 33 (2000) , pages 445-463, 2002.
  7. Chun-Ming Tsai Intelligent region-based thresholding for color document images with highlighted regions Pattern Recognition 45 (2012) pages 1341-1362
  8. D.F. Dunn, T.P. Weldon, W.E. Higgins Extracting halftones from printed documents using texture analysis Opt. Eng 36 (4) (1997) pages 1044-1052.
Index Terms

Computer Science
Information Sciences

Keywords

OCR threshold binarization dither patterns connected components