Call for Paper - October 2019 Edition
IJCA solicits original research papers for the October 2019 Edition. Last date of manuscript submission is September 20, 2019. Read More

Extracting Text from Telugu Color Documents by Removing Dither Patterns

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2019
Authors:
Siva Rama Sastry Gumma
10.5120/ijca2019918653

Siva Rama Sastry Gumma. Extracting Text from Telugu Color Documents by Removing Dither Patterns. International Journal of Computer Applications 181(48):1-7, April 2019. BibTeX

@article{10.5120/ijca2019918653,
	author = {Siva Rama Sastry Gumma},
	title = {Extracting Text from Telugu Color Documents by Removing Dither Patterns},
	journal = {International Journal of Computer Applications},
	issue_date = {April 2019},
	volume = {181},
	number = {48},
	month = {Apr},
	year = {2019},
	issn = {0975-8887},
	pages = {1-7},
	numpages = {7},
	url = {http://www.ijcaonline.org/archives/volume181/number48/30476-2019918653},
	doi = {10.5120/ijca2019918653},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

Preprocessing is an important step in the development of Optical Character Recognition (OCR) system. Inpreprocessing there are various modules like binarization, skew detection and correction etc. Among these modules this paper discusses about binarization module. Although there are many algorithms for binarization of a document image, there are fewer algorithms for binarization of printed color images because of printed color documents contain dither patterns, normal text, reversed text, colored text overlayed on colored background drawings and graphics appear with millions of different colors. Hence preprocessing for colored documents is a challenging task to work.For printed color documents, elimination of dither patterns using Butterworth band reject filter and text extraction in the color documents by eliminating graphics using height of the component is also presented. Results on a corpus consisting of newspapers published in Telugu show that the proposed method shows promising results.

References

  1. Nobuyuki Otsu A Threshold Selection Method from Gray- Level Histograms IEEE Transactions on Systems and Cybernetics (1979), Vol. SMC-9 No. 1, January 1979.
  2. W.Niblack An Introduction to Digital Image Processing. 1986.
  3. J.Sauvola,T,Seppanen, S.Haapakoski and M.Pietikainen Adaptive Document Binarization ICDAR’97 4th Int. Conf. On Document Analysis and Recognition pages.147- 152.
  4. C.Strouthopoulos, N. Papamarkos and A.E. Atsalakis Text Extraction in Complex Color Documents Pattern Recognition (2002) , pages 1743-1758, 2002.
  5. Efthimios Badekas , Nikos Nikolaou, Nikos Papamarkos Text Binarization in Color Documents Wiley Periodicals, Inc, Vol. 16, 262274 (2007)
  6. Dennis F. Dunn and Niloufer E. Mathew Extracting color halftones from printed documents using texture analysis Pattern Recognition 33 (2000) , pages 445-463, 2002.
  7. Chun-Ming Tsai Intelligent region-based thresholding for color document images with highlighted regions Pattern Recognition 45 (2012) pages 1341-1362
  8. D.F. Dunn, T.P. Weldon, W.E. Higgins Extracting halftones from printed documents using texture analysis Opt. Eng 36 (4) (1997) pages 1044-1052.

Keywords

OCR, threshold, binarization, dither patterns, connected components