We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

Optical Character Recognition by Open source OCR Tool Tesseract: A Case Study

by Chirag Patel, Atul Patel, Dharmendra Patel
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 55 - Number 10
Year of Publication: 2012
Authors: Chirag Patel, Atul Patel, Dharmendra Patel
10.5120/8794-2784

Chirag Patel, Atul Patel, Dharmendra Patel . Optical Character Recognition by Open source OCR Tool Tesseract: A Case Study. International Journal of Computer Applications. 55, 10 ( October 2012), 50-56. DOI=10.5120/8794-2784

@article{ 10.5120/8794-2784,
author = { Chirag Patel, Atul Patel, Dharmendra Patel },
title = { Optical Character Recognition by Open source OCR Tool Tesseract: A Case Study },
journal = { International Journal of Computer Applications },
issue_date = { October 2012 },
volume = { 55 },
number = { 10 },
month = { October },
year = { 2012 },
issn = { 0975-8887 },
pages = { 50-56 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume55/number10/8794-2784/ },
doi = { 10.5120/8794-2784 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:56:55.725173+05:30
%A Chirag Patel
%A Atul Patel
%A Dharmendra Patel
%T Optical Character Recognition by Open source OCR Tool Tesseract: A Case Study
%J International Journal of Computer Applications
%@ 0975-8887
%V 55
%N 10
%P 50-56
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Optical character recognition (OCR) method has been used in converting printed text into editable text. OCR is very useful and popular method in various applications. Accuracy of OCR can be dependent on text preprocessing and segmentation algorithms. Sometimes it is difficult to retrieve text from the image because of different size, style, orientation, complex background of image etc. We begin this paper with an introduction of Optical Character Recognition (OCR) method, History of Open Source OCR tool Tesseract, architecture of it and experiment result of OCR performed by Tesseract on different kinds images are discussed. We conclude this paper by comparative study of this tool with other commercial OCR tool Transym OCR by considering vehicle number plate as input. From vehicle number plate we tried to extract vehicle number by using Tesseract and Transym and compared these tools based on various parameters.

References
  1. ARCHANA A. SHINDE, D. 2012. Text Pre-processing and Text Segmentation for OCR. International Journal of Computer Science Engineering and Technology, pp. 810-812.
  2. ANAGNOSTOPOULOS,C. ,ANAGNOSTOPOULOS, I. , LOUMOS, V, & KAYAFAS, E. 2006. A License Plate Recognition Algorithm for Intelligent Transportation System Applications. . , IEEE Transactions on Intelligent Transportation Systems, pp. 377- 399.
  3. Y. WEN, Y. L. 2011. An Algorithm for License Plate Recognition Applied to Intelligent Transportation System. , IEEE Transactions on Intelligent Systems, pp. 1-16.
  4. XIN FAN, G. L. 2009. Graphical Models for Joint Segmentation and Recognition of License Plate Characters. IEEE Signal Processing Letters, pp. 10-13.
  5. HUI WU, B. L. 2011. License Plate Recognition system. International Conference on Multimedia Technology (ICMT). pp. 5425 - 5427.
  6. PAN, Y. -F. , HOU, X. , & LIU, C. -L. 2008. A Robust System to Detect and Localize Texts in Natural Scene Images. The Eighth IAPR International Workshop on Document Analysis Systems.
  7. SMITH, R. 2007. An Overview of the Tesseract OCR Engine. In proceedings of Document analysis and Recognition. . ICDAR 2007. IEEE Ninth International Conference.
  8. GOOGLE. Google Code. google code. [Online] 2012. http://code. google. com/p/tesseract-ocr/.
  9. F. SHAFAIT, D. K. San Jose, CA : s. n. , 2008. Efficient Implementation of Local Adaptive Thresholding Techniques Using Integral Images. . In Document Recognition and Retrieval XV, S&T/SPIE Annual Symposium on Electronic Imaging.
  10. 1stwebdesigner. 1stwebdesigner. [Online] 2012. http://www. 1stwebdesigner. com/wp- content/uploads/2009/11/typography- tutorial/text1-how-to-create-typographic- wallpaper. jpg.
  11. dsigninspire. Desing Inspire. [Online] 2012. http://dsigninspire. com/wpcontent/uploads/2011/ 09/moon-shine. jpg.
  12. Geometric Rectification of Camera-Captured Document Images. Jian Liang; DeMenthon, D. ; Doermann, D. ; April 2008. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 4, pp. 591-605.
  13. Y. WEN, Y. L. 2011. ,An Algorithm for License Plate Recognition Applied to Intelligent Transportation System. IEEE Transactions on Intelligent Systems, pp. 1-16.
  14. Lihong Zheng, Xiangjian He, Bijan Samali, Laurence T. Yang. An algorithm for accuracy enhancement of license plate recognition. Journal of Computer and System Sciences, Available online 9 May 2012.
  15. Deselaers, T. ; Gass, T. ; Heigold, G. ; Ney, H. ; Latent Log-Linear Models for Handwritten Digit Classification. June 2012. IEEE Transactions on Pattern Analysis and Machine Intelligence, , vol. 34, no. 6, pp. 1105-1117, doi: 10. 1109/TPAMI. 2011. 218.
  16. Jianbin Jiao, Qixiang Ye, Qingming Huang, A configurable method for multi-style license plate recognition. 2009. Pattern Recognition, Volume 42, Issue 3, , Pages 358-369.
  17. H. Erdinc Kocer, K. Kursat Cevik. 2011. Artificial neural networks based vehicle license plate recognition. Procedia Computer Science, Volume 3, Pages 1033-1037.
  18. Apurva A. Desai. 2010. Gujarati handwritten numeral optical character reorganization through neural network, Pattern Recognition, Volume 43, Issue 7 Pages 2582-2589, ISSN 0031-3203, 10. 1016/j. patcog. 2010. 01. 008.
  19. Roy, A. ; Ghoshal, D. P. 2011. Number Plate Recognition for use in different countries using an improved segmentation, 2nd National Conference on Emerging Trends and Applications in Computer Science (NCETACS),vol. , no. , pp. 1-5, 4-5. doi: . 1109/NCETACS. 2011. 5751407.
  20. Umapada Pal, Partha Pratim Roy, Nilamadhaba Tripathy, Josep Lladós. December 2010. Multi-oriented Bangla and Devnagari text recognition, Pattern Recognition, Volume 43, Issue 12, Pages 4124-4136, 10. 1016/j. patcog. 2010. 06. 017.
  21. Bilal Bataineh, Siti Norul Huda Sheikh Abdullah, Khairuddin Omar. 2011. An adaptive local binarization method for document images based on a novel thresholding method and dynamic windows, Pattern Recognition Letters, Volume 32, Issue 14, , Pages 1805-1813, ISSN 0167-8655, 10. 1016/j. patrec. 2011. 08. 001.
  22. Fink, Gernot. 2009. Markov models for offline handwriting recognition: a survey. International Journal on Document Analysis and Recognition. Springer Berlin / Heidelberg pp. 269-298,volume: 12,Doi: 10. 1007/s10032-009-0098-4
Index Terms

Computer Science
Information Sciences

Keywords

Optical Character Recognition (OCR) Open Source DLL Tesseract Transym