Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

Head Mounted Device for Real World Text to Speech Conversion

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2016
Nikhil Varghese, Gaurav Tripathi

Nikhil Varghese and Gaurav Tripathi. Head Mounted Device for Real World Text to Speech Conversion. International Journal of Computer Applications 155(5):16-20, December 2016. BibTeX

	author = {Nikhil Varghese and Gaurav Tripathi},
	title = {Head Mounted Device for Real World Text to Speech Conversion},
	journal = {International Journal of Computer Applications},
	issue_date = {December 2016},
	volume = {155},
	number = {5},
	month = {Dec},
	year = {2016},
	issn = {0975-8887},
	pages = {16-20},
	numpages = {5},
	url = {},
	doi = {10.5120/ijca2016912309},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}


There is no low-cost aid for visually impaired people despite several advances in technology. This paper presents a mobile head-mounted device to detect and convert text in natural scenes to speech. The major components of the device are a Raspberry Pi, a high definition webcam, earphones and a portable power bank. The Raspberry Pi is connected to the webcam which captures the image. A text detection algorithm using Class Specific Extremal Regions (CSERs) is implemented to detect the text in complex natural scenes. The segmented image is passed to the Tesseract OCR engine for text detection. The identified text is converted to audio using the espeak Python module in the Raspberry Pi. Thus, a visually impaired person can use this device to hear all the text in his surroundings like the name of a shop, public notices, billboards, road directions, etc.


  1. (Aug. 2014). WHO | Visual impairment and blindness. [Online] Available:
  2. R. Kurzweil, The age of spiritual machines: when computers exceed human intelligence. Viking Press, 1998
  3. T. Hedgpeth, J. A. Black, and S. Panchanathan, “A demonstration of the icare portable reader,” in ACM SIGACCESS, 2006, pp. 279–280.
  4. H. Aoki, B. Schiele, and A. Pentland, “Realtime personal positioning system for a wearable computer,” in ISWC, 1999, pp. 37–43.
  5. J. Chmiel, O. Stankiewicz, W. Switala, M. Tluczek, and J. Jelonek, “Read IT project report: A portable text reading system for the blind people,” 2005
  6. About – Google Translate. [Online] Available:
  7. (2016). KNFB Reader. [Online] Available:
  8. X. Shi and Y. Xu, “A wearable translation robot,” in ICRA, 2005.
  9. Carlos Merino-Gracia, Karel Lenc and Majid Mirmehdi, “A Headmounted Device for Recognizing Text in Natural Scenes”, Visual Information Laboratory, University of Bristol, UK
  10. Help Videos - Raspberry Pi. [Online] Available:
  11. (2016). Logitech C920 HD Pro Webcam for Windows, Mac, and Chrome OS. [Online] Available:
  12. (Nov, 2014). Class-specific Extremal Regions for Scene Text Detection. [Online] Available:
  13. Chen, Huizhong, et al. “Robust Text Detection in Natural Images with Edge-Enhanced Maximally Stable Extremal Regions.” Image Processing
  14. J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide baseline stereo from maximally stable extremal regions.” In BMVC, 2002 (ICIP), 2011 18th IEEE International Conference on. IEEE, 2011 Document Analysis and Recognition, 2013
  15. Gomez L. and Karatzas D., "Multi-script Text Extraction from Natural Scenes", 12th International Conference on Robust Text Detection in Natural Scene Images.
  16. GitHub Tessaract OCR. [Online] Available:
  17. Thierry DutoitTTS research team, TCTS Lab:An Introduction to text-to-speech synthesis - TCTS Lab
  18. Neumann L., Matas J.: Real-Time Scene Text Localization and Recognition, CVPR 2012 (Providence, Rhode Island, USA)
  19. (2016).GitHub TessData. [Online] Available:
  20. (Aug, 2016). Norvig, P. How to Write a Spelling Corrector. [Online] Available:
  21. eSpeak text to speech. [Online] Available:
  22. (Oct, 2012). Yao, C. MSRA Text Detection 500 Database. [Online] Available:
  23. Andrej Karpathy, Li Fei-Fei "Deep Visual-Semantic Alignments for Generating Image Descriptions", Department of Computer Science, Stanford University, 2014


Class-Specific Extremal Region, Head-mounted device, MSER(Maximally Stable Extremal Regions), Raspberry Pi, Tesseract OCR, Probabilistic Hough Lines Transformation