CFP last date
20 June 2024
Reseach Article

Image Captioning Web Application using Deep Learning Algorithms

by Surya R.E., Mahalakshmi S.B.
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 185 - Number 40
Year of Publication: 2023
Authors: Surya R.E., Mahalakshmi S.B.
10.5120/ijca2023923204

Surya R.E., Mahalakshmi S.B. . Image Captioning Web Application using Deep Learning Algorithms. International Journal of Computer Applications. 185, 40 ( Nov 2023), 29-33. DOI=10.5120/ijca2023923204

@article{ 10.5120/ijca2023923204,
author = { Surya R.E., Mahalakshmi S.B. },
title = { Image Captioning Web Application using Deep Learning Algorithms },
journal = { International Journal of Computer Applications },
issue_date = { Nov 2023 },
volume = { 185 },
number = { 40 },
month = { Nov },
year = { 2023 },
issn = { 0975-8887 },
pages = { 29-33 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume185/number40/32953-2023923204/ },
doi = { 10.5120/ijca2023923204 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:28:16.336738+05:30
%A Surya R.E.
%A Mahalakshmi S.B.
%T Image Captioning Web Application using Deep Learning Algorithms
%J International Journal of Computer Applications
%@ 0975-8887
%V 185
%N 40
%P 29-33
%D 2023
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Images and visual signs play a vital role in communication and comprehension, but they pose challenges for visually impaired individuals. This paper presents an innovative solution, leveraging Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks, to create a photo-to-speech application that enhances the quality of life for individuals with visual impairments. The development, methodology, and evaluation of this application, demonstrates its potential to provide real-time image captions and improve accessibility.This work includes Flickr 8k dataset for training the model (VGG16) and attains a BLEU score of 56%.

References
  1. William Fedus, Ian Goodfellow, and Andrew M Dai. Maskgan: Better text generation. arXiv preprint arXiv:1801.07736, 47, 2018.
  2. Girish Kulkarni, Visruth Premraj, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C Berg, and Tamara L Berg. Baby talk: Understanding and generating image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35:2891–2903, June 2013.
  3. Yunchao Gong, Liwei Wang, Micah Hodosh, Julia Hockenmaier, and Svetlana Lazebnik. Improving image-sentence embeddings using large weakly annotated photo collections. European Conference on Computer Vision. Springer, pages 529–545, 2014.
  4. Peter Young Micah Hodosh and Julia Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, 47:853–899, 2013.
  5. Alex Graves. Generating sequences with recurrent neural networks. CoRR, abs/1308.0850, 2013.
  6. Boosting image captioning with attributes. IEEE International Conference on Computer Vision (ICCV), pages 4904–4912, 2017.
  7. https://www.hindawi.com/journals/cin/2020/3062706/
  8. Xu Jia, Efstratios Gavves, Basura Fernando and Tinne Tuytelaars, "Guiding long-short term memory for image caption generation", 2015.
  9. Xinlei Chen and C. Lawrence Zitnick. Learning a recurrent visual representation for image caption generation. CoRR, abs/1411.5654, 2014.
  10. A. Hani, N. Tagougui and M. Kherallah, "Image Caption Generation Using A Deep Architecture," 2019 International Arab Conference on Information Technology (ACIT), 2019, pp. 246-251, doi: 10.1109/ACIT47987.2019.8990998.
  11. Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. 2016. Spice: Semantic propositional image caption evaluation. In European Conference on Computer Vision.
  12. Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan (2015): Show and Tell: A Neural Image Caption Generator.
Index Terms

Computer Science
Information Sciences

Keywords

Image captioning Convolutional Neural Networks Long Short-Term Memory Visual impairment Accessibility.