Image Captioning Web Application using Deep Learning Algorithms

Surya R.E.; Mahalakshmi S.B.

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Image Captioning Web Application using Deep Learning Algorithms

by Surya R.E., Mahalakshmi S.B.

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 185 - Number 40

Year of Publication: 2023

Authors: Surya R.E., Mahalakshmi S.B.

10.5120/ijca2023923204

Surya R.E., Mahalakshmi S.B. . Image Captioning Web Application using Deep Learning Algorithms. International Journal of Computer Applications. 185, 40 ( Nov 2023), 29-33. DOI=10.5120/ijca2023923204

@article{ 10.5120/ijca2023923204,

author = { Surya R.E., Mahalakshmi S.B. },

title = { Image Captioning Web Application using Deep Learning Algorithms },

journal = { International Journal of Computer Applications },

issue_date = { Nov 2023 },

volume = { 185 },

number = { 40 },

month = { Nov },

year = { 2023 },

issn = { 0975-8887 },

pages = { 29-33 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume185/number40/32953-2023923204/ },

doi = { 10.5120/ijca2023923204 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T01:28:16.336738+05:30

%A Surya R.E.

%A Mahalakshmi S.B.

%T Image Captioning Web Application using Deep Learning Algorithms

%J International Journal of Computer Applications

%@ 0975-8887

%V 185

%N 40

%P 29-33

%D 2023

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Images and visual signs play a vital role in communication and comprehension, but they pose challenges for visually impaired individuals. This paper presents an innovative solution, leveraging Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks, to create a photo-to-speech application that enhances the quality of life for individuals with visual impairments. The development, methodology, and evaluation of this application, demonstrates its potential to provide real-time image captions and improve accessibility.This work includes Flickr 8k dataset for training the model (VGG16) and attains a BLEU score of 56%.

References

William Fedus, Ian Goodfellow, and Andrew M Dai. Maskgan: Better text generation. arXiv preprint arXiv:1801.07736, 47, 2018.
Girish Kulkarni, Visruth Premraj, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C Berg, and Tamara L Berg. Baby talk: Understanding and generating image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35:2891–2903, June 2013.
Yunchao Gong, Liwei Wang, Micah Hodosh, Julia Hockenmaier, and Svetlana Lazebnik. Improving image-sentence embeddings using large weakly annotated photo collections. European Conference on Computer Vision. Springer, pages 529–545, 2014.
Peter Young Micah Hodosh and Julia Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, 47:853–899, 2013.
Alex Graves. Generating sequences with recurrent neural networks. CoRR, abs/1308.0850, 2013.
Boosting image captioning with attributes. IEEE International Conference on Computer Vision (ICCV), pages 4904–4912, 2017.
https://www.hindawi.com/journals/cin/2020/3062706/
Xu Jia, Efstratios Gavves, Basura Fernando and Tinne Tuytelaars, "Guiding long-short term memory for image caption generation", 2015.
Xinlei Chen and C. Lawrence Zitnick. Learning a recurrent visual representation for image caption generation. CoRR, abs/1411.5654, 2014.
A. Hani, N. Tagougui and M. Kherallah, "Image Caption Generation Using A Deep Architecture," 2019 International Arab Conference on Information Technology (ACIT), 2019, pp. 246-251, doi: 10.1109/ACIT47987.2019.8990998.
Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. 2016. Spice: Semantic propositional image caption evaluation. In European Conference on Computer Vision.
Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan (2015): Show and Tell: A Neural Image Caption Generator.

Index Terms

Computer Science

Information Sciences

Keywords

Image captioning Convolutional Neural Networks Long Short-Term Memory Visual impairment Accessibility.