A New Approach for Generating Text Description of Images and Speech Synthesis

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

A New Approach for Generating Text Description of Images and Speech Synthesis

Published on December 2015 by Mrunmayee Patil, Ramesh Kagalkar

National Conference on Advances in Computing

Foundation of Computer Science USA

NCAC2015 - Number 1

December 2015

Authors: Mrunmayee Patil, Ramesh Kagalkar

Mrunmayee Patil, Ramesh Kagalkar . A New Approach for Generating Text Description of Images and Speech Synthesis. National Conference on Advances in Computing. NCAC2015, 1 (December 2015), 12-17.

@article{

author = { Mrunmayee Patil, Ramesh Kagalkar },

title = { A New Approach for Generating Text Description of Images and Speech Synthesis },

journal = { National Conference on Advances in Computing },

issue_date = { December 2015 },

volume = { NCAC2015 },

number = { 1 },

month = { December },

year = { 2015 },

issn = 0975-8887,

pages = { 12-17 },

numpages = 6,

url = { /proceedings/ncac2015/number1/23355-5013/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 National Conference on Advances in Computing

%A Mrunmayee Patil

%A Ramesh Kagalkar

%T A New Approach for Generating Text Description of Images and Speech Synthesis

%J National Conference on Advances in Computing

%@ 0975-8887

%V NCAC2015

%N 1

%P 12-17

%D 2015

%I International Journal of Computer Applications

Abstract

An image can be defined as a matrix of square pixels arranged in rows and columns. Image processing is a leading technology, which enhances raw images received from gadgets such as camera or a mobile phone in normal day to day life for various applications. An image to text and speech conversion system can be useful for improving accessibility of images for visually impaired as well as physically challenging people understand the scenario from the images and also train the system as that of human brain. The techniques of image segmentation and edge detection play an important role in implementing proposed system. The system generates text descriptions for an input image given by the user. Object wise generation of sentences, preposition and conjunction mapping is a challenging task. We formulate the interaction between image segmentation and object recognition in the framework of Canny algorithm. The system goes through various phases such as pre-processing, feature extraction, object recognition, edge detection, image segmentation and Text To Speech (TTS) conversion. The proposed system database consists of huge set of sample images, which help to perform training of database. The accuracy of proposed system is achieved due to the proper recognition of objects and sentences are formed making use of the recognized objects. These sample images consists of several categories of images. The system mainly consists of two main modules such as image to text and text to speech. An image to text module generates text descriptions in natural language based on understanding of image. A text to speech module generates speech synthesis in English from description of natural language.

References

Mrunmayee Patil and Ramesh M. Kagalkar, "A Review On Conversion of Image To Text As Well As Speech Using Edge Detection and Image Segmentation",International Journal of Science and Research (IJSR 2014), ISSN (Online): 2319-7064 , Vol-3, Issue 10 Oct- 2014.
Mrunmayee Patil and Ramesh M. Kagalkar," An Automatic Approach For Translating Simple Images Into Text Descriptions And Speech For Visually Impaired People ", Inter- national Journal of Computer Applications (IJCA), Vol- 118 , No. 3, May 2015.
Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C. Berg and Tamara L. Berg, "Baby Talk: Understanding and Generating Simple Descriptions," IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 35, No. 12, December 2013.
Benjamin Z. Yao, Xiong Yang, Liang Lin, Mun Wai Lee and Song-Chun Zhu, "I2T: Image Parsing to Text Description" ,IEEE transactions on image processing, 2008.
Iasonas Kokkinos, Member, IEEE, and Petros Maragos, Fellow, IEEE "Synergy between Object Recognition and Image Segmentation Using the Expectation-Maximization Algorithm", IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 31, No. 8, August 2009.
Fan-Chieh Cheng, Shih-Chia Huang, and Shanq-Jang Ruan, Member, IEEE "Illumination-Sensitive Background Modeling Approach for Accurate Moving Object Detection", IEEE Transactions On Broadcasting, Vol. 57, No. 4, December 2011.
Dhiraj Joshi, James Z. Wang And Jia Li, The Pennsylvania State University, "The Story Picturing Engine—A System for Automatic Text Illustration", ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 1, February 2006.
Munawar Hayat, Mohammed Bennamoun and Senjian An "Deep Reconstruction Models for Image Set Classification", IEEE Transactions on Pattern Analysis and Machine Intelligence.
Mina Makar, Member, IEEE, Vijay Chandrasekhar, Member, IEEE, Sam S. Tsai, Member, IEEE, David Chen, Member, IEEE, and Bernd Girod, Fellow, IEEE, "Interframe Coding of Feature Descriptors for Mobile Augmented Reality", IEEE Transactions On Image Processing, Vol. 23, No. 8, August 2014.
A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, "Content-based image retrieval at the end of the early years," IEEE Trans. PAMI, vol. 22, no. 12, 2000.
M. S. Lew, N. Sebe, C. Djeraba, and R. Jain, "Content-based multimedia information retrieval: State of the art and challenges," ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 2, no. 1, pp. 1–19, Feb. 2006.
A. Mian, M. Bennamoun, and R. Owens, "An efficient multimodal 2d-3d hybrid approach to automatic face recognition," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 29, no. 11, pp. 1927–1943, 2007.
S. Feng, D. Xu, X. Yang, Attention-driven salient edge(s) and region(s) extraction with application to CBIR, Signal Processing 90, pp. 1–15, 2010.
A. Vailaya, A. Jain, H. J Zhang, On Image Classification: City Images vs. Landscape, Proceeding of the IEEE workshop on Content-Based Access of Image and Video Libraries, pp. 3-8, 1998.
J. Shanbehzadeh, F. Mahmoudi, A. Sarafzadeh, A. M. Eftekhari-Moghaddam, Image Retrieval Based on the Directional Edge Similarity, Proceeding of the SPIE: Multimedia Storage and Archiving Systems, Vol. IV, USA, pp. 267-71, 1999.
A. Farhadi, M. Hejrati, A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. A. Forsyth, "Every Picture Tells a Story: Generating Sentences for Im- ages", Proc. European Conference On Computer Vision, 2010.
C. Rashtchian, P. Young, M. Hodosh, and J. Hockenmaier, "Collecting Image Annotations Using Amazons Mechanical Turk",Proc. NAACL HLT Workshop Creating Speech and Language Data with Amazons Mechanical Turk, 2010.
Y. Yang, C. L. Teo, H. Daume III, and Y. Aloimonos, "Corpus- Guided Sen- tence Generation of Natural Images", Proc. Conference on Empirical Methods in Natural Language Processing, 2011.
Amitkumar Shinde & Ramesh Kagalkar,"Advanced Marathi Sign Language Recognition using Computer Vision , International Journal of Computer Applications (IJCA), Volume 118 - No. 13,(ISSN No: 0975 8887), pp:1-7, April 2015.
Amitkumar Shinde&Ramesh Kagalkar,"Sign Language Recognition for Deaf Sign User",International Journal For Research in Applied Science and Engineering Technology (IJRASET),Volume 2, Issue XII, December 2014, (ISSN No: 2321-9653), pp:67-69.
Kaveri Kamble and Ramesh Kagalkar, "A Review: Translation of Text to Speech Conver- sion for Hindi Language " , International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064, Vol. 3 Issue 11, November 2014.
Kaveri Kamble and Ramesh Kagalkar, "Audio Visual Speech Synthesis and Speech Recog- nition for Hindi Language", International Journal of Computer Science and Information Technologies(IJCSIT) ISSN (Online): 0975-9646, Vol. 6 Issue 2, April 2015.
Kaveri Kamble and Ramesh Kagalkar, "A Novel Approach for Hindi Text Description to Speech and Expressive Speech Synthesis " , International Journal of Applied Information Systems (IJAIS) ISSN 2249-0868, Vol. 8 Issue 7 May 2015.
Shivaji J. Chaudhari and Ramesh M. Kagalkar, "A Review of Automatic Speaker Age Classification, Recognition and Identifying Speaker Emotion Using Voice Signal", International
Journal of Science and Research (IJSR 2014), ISSN(Online)2319-7064, Volume 3, Issue 11, November 2014. Shivaji J. Chaudhari and Ramesh M. Kagalkar, "Automatic Speaker Age Estimation and Gender Dependent Emotion Recognition ", International Journal of Computer Applications (IJCA) (0975 - 8887), Volume 117 No. 17, May 2015.
Shivaji J. Chaudhari and Ramesh M Kagalkar, "A Methodology for Efficient Gender Dependent Speaker Age and Emotion Identification System ", International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE) ISSN 2319-5940, Volume 4, Issue 7, July 2015.

Index Terms

Computer Science

Information Sciences

Keywords

Image Processing Image Segmentation Speech Synthesis Text To Speech Conversion Edge Detection.