CFP last date
20 May 2024
Reseach Article

An Open Source Tesseract based Tool for Extracting Text from Images with Application in Braille Translation for the Visually Impaired

by Pijush Chakraborty, Arnab Mallik
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 68 - Number 16
Year of Publication: 2013
Authors: Pijush Chakraborty, Arnab Mallik
10.5120/11664-7254

Pijush Chakraborty, Arnab Mallik . An Open Source Tesseract based Tool for Extracting Text from Images with Application in Braille Translation for the Visually Impaired. International Journal of Computer Applications. 68, 16 ( April 2013), 26-32. DOI=10.5120/11664-7254

@article{ 10.5120/11664-7254,
author = { Pijush Chakraborty, Arnab Mallik },
title = { An Open Source Tesseract based Tool for Extracting Text from Images with Application in Braille Translation for the Visually Impaired },
journal = { International Journal of Computer Applications },
issue_date = { April 2013 },
volume = { 68 },
number = { 16 },
month = { April },
year = { 2013 },
issn = { 0975-8887 },
pages = { 26-32 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume68/number16/11664-7254/ },
doi = { 10.5120/11664-7254 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:28:02.475940+05:30
%A Pijush Chakraborty
%A Arnab Mallik
%T An Open Source Tesseract based Tool for Extracting Text from Images with Application in Braille Translation for the Visually Impaired
%J International Journal of Computer Applications
%@ 0975-8887
%V 68
%N 16
%P 26-32
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Many valuable paper documents are usually scanned and kept as images for backup. Extracting text from the images is quite helpful and thus a need for some tool for this extraction is always there. One of the important applications of this tool is its use in Braille Translation. Braille has been the primary writing and reading system used by the visually impaired since the 19th century. This application that extracts text from images and then converts it to Braille will prove to be quite useful for converting old valuable documents or books into Braille format. In this paper the complete methodology used for the extraction of texts from scanned images and for the translation of texts to Braille is presented. The scanned images are initially pre-processed and converted to grayscale and then passed through an adaptive threshold function for conversion to binary image. Then it is sent for Recognition using Google's powerful Tesseract recognition engine which is considered to be the best Open Source OCR Engine currently available. The generated text is then post-processed using a spell checking API JOrtho for removing the errors in the previous step. The final corrected text is then translated to a six dot cell Braille format using a set of rules provided by www. iceb. org. The translation to Braille includes conversion of numbers, alphabets, symbols and compound letters. The translated text can then be saved for printing the document later or for sending it to a Refreshable Braille Display.

References
  1. Tesseract Project Site: http://code. google. com/p/tesseractocr.
  2. Ray Smith, Chris Newton, Phil Cheatle, Adaptive Threshold for OCR: A Significant Test, HP Laboratories Bristol, March 1993
  3. R. Smith, An Overview of the Tesseract OCR Engine, Proc. Ninth Int. Conference on Document Analysis and Recognition , IEEE Computer Society (2007)
  4. Ray Smith, Tesseract OCR Engine, OSCON Conference 2007
  5. Chirag Patel, Atul Patel, Dharmendra Patel, Optical Character Recognition by Open source OCR Tool Tesseract: A Case Study, IJCA Volume 55 Issue 10, October 2012
  6. Tess4J Project Site: http://tess4j. sourceforge. net/
  7. JOrtho Project Site: http://jortho. sourceforge. net/
  8. Soundex Reference: http://en. wikipedia. org/wiki/Soundex
  9. The Rules of Unified English Braille, International Council on English Braille(ICEB), June 2001
  10. Braille ASCII: http://en. wikipedia. org/wiki/Braille_ASCII
  11. Paul Blenkhorn, A System for Converting Braille to Print, IEEE Transactions on Rehabilation Engineering, Vol. 3 No. , June 1995
  12. Manzeet Singh, Parteek Bhatia, Automated Conversion of English and Hindi Text to Braille Representation, IJCA Volume 4 Issue 6, April 2010
  13. Md. Abul Hasnat, Muttakinur Rahman Chowdhury, Mumit Khan, An open source Tesseract based Optical Character Recognizer for Bangla script, 10th International Conference on Document and Recognition, 2009
  14. BrailleOCR Project Site: https://code. google. com/p/brailleocr/
Index Terms

Computer Science
Information Sciences

Keywords

OCR Tesseract Tess4J JOrtho Phonetic Matching Soundex Braille Braille Translation Braille ASCII