CFP last date
22 April 2024
Reseach Article

An Efficient Hybrid Architecture for Visual Behavior Recognition using Convolutional Neural Network

by Md Javedul Ferdous, A. F. M. Saifuddin Saif, Dip Nandi, Mashiour Rahman
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 181 - Number 25
Year of Publication: 2018
Authors: Md Javedul Ferdous, A. F. M. Saifuddin Saif, Dip Nandi, Mashiour Rahman
10.5120/ijca2018918048

Md Javedul Ferdous, A. F. M. Saifuddin Saif, Dip Nandi, Mashiour Rahman . An Efficient Hybrid Architecture for Visual Behavior Recognition using Convolutional Neural Network. International Journal of Computer Applications. 181, 25 ( Nov 2018), 32-37. DOI=10.5120/ijca2018918048

@article{ 10.5120/ijca2018918048,
author = { Md Javedul Ferdous, A. F. M. Saifuddin Saif, Dip Nandi, Mashiour Rahman },
title = { An Efficient Hybrid Architecture for Visual Behavior Recognition using Convolutional Neural Network },
journal = { International Journal of Computer Applications },
issue_date = { Nov 2018 },
volume = { 181 },
number = { 25 },
month = { Nov },
year = { 2018 },
issn = { 0975-8887 },
pages = { 32-37 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume181/number25/30094-2018918048/ },
doi = { 10.5120/ijca2018918048 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:07:06.895156+05:30
%A Md Javedul Ferdous
%A A. F. M. Saifuddin Saif
%A Dip Nandi
%A Mashiour Rahman
%T An Efficient Hybrid Architecture for Visual Behavior Recognition using Convolutional Neural Network
%J International Journal of Computer Applications
%@ 0975-8887
%V 181
%N 25
%P 32-37
%D 2018
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The purpose of this research work is to understand of visual behavior from image. Since computer vision is hugely potential research area for researcher, connecting image captioning and detection of an object, visual behavior detection started to fasten researchers’ consideration because of its descriptive power and clear structure in terms of accuracy. By the progress of Deep Learning, giving the computer a chance to comprehend an image is by all accounts progressively closer. With the analysis on object recognition slowly getting to develop progressively more scientists put their consideration on more elevated amount comprehension of the scene. Object detection, visual context is now more consideration in scene understanding as a middle stage. The goal of the research is to discover visual relationships in a given image between objects and understand the whole scenario. This research presents a framework to this problem. Proposed approach performs object detection by using convolutional neural network. . This research focus on relationships that can be generated by long short term memory (LSTM). The focus was to design the framework to adopt the Convolutional Neural network with LSTM architecture. Proposed framework is validated using COCO dataset and achieved a BLEU-4 of 23.5 shows better efficiency than previous research methods.

References
  1. Lu, C., Krishna, R., Bernstein, M. and Fei-Fei, L, Visual relationship detection with language priors. In European Conference on Computer Vision, Springer Cham, (pp. 852-869), 2016.
  2. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R. and LeCun, Y., Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229, 2013.
  3. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M. and Berg, A.C., Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), pp.211-252, 2015.
  4. Gupta, A. and Mannem, P., From image annotation to image description. In International Conference on Neural Information Processing. Springer, (pp. 196-204), 2012.
  5. Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K. and Darrell, T., Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2625-2634), 2015.
  6. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B. and Lee, H., Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396, 2016.
  7. Li, Y., Ouyang, W. and Wang, X., Vip-cnn: Visual phrase guided convolutional neural network. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 7244-7253), 2017.
  8. Xu, D., Zhu, Y., Choy, C.B. and Fei-Fei, L., Scene graph generation by iterative message passing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Vol. 2), 2017.
  9. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R. and Bengio, Y., Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning (pp. 2048-2057), 2015.
  10. Ramanathan, V., Li, C., Deng, J., Han, W., Li, Z., Gu, K., Song, Y., Bengio, S., Rosenberg, C. and Fei-Fei, L., Learning semantic relationships for better action retrieval in images. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1100-1109), 2015.
  11. Galleguillos, C., Rabinovich, A. and Belongie, S., Object categorization using co-occurrence, location and appearance. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on (pp. 1-8), 2008.
  12. Ladicky, L., Russell, C., Kohli, P. and Torr, P.H., Graph cut based inference with co-occurrence statistics. In European Conference on Computer Vision (pp. 239-253). Springer, 2010.
  13. Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E. and Belongie, S., Objects in context. IEEE 11th international conference Computer vision, ICCV 2007 (pp. 1-8). 2007.
  14. Salakhutdinov, R., Torralba, A. and Tenenbaum, J., Learning to share visual appearance for multiclass object detection. In Computer Vision and Pattern Recognition (CVPR), IEEE Conference on (pp. 1481-1488), 2011.
  15. Jia, Z., Gallagher, A., Saxena, A. and Chen, T., 3d-based reasoning with blocks, support, and stability. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on (pp. 1-8), 2013.
  16. Silberman, N., Hoiem, D., Kohli, P. and Fergus, R., October. Indoor segmentation and support inference from rgbd images. In European Conference on Computer Vision (pp. 746-760). Springer, 2012.
  17. Zheng, B., Zhao, Y., Yu, J., Ikeuchi, K. and Zhu, S.C., Scene understanding by reasoning stability and safety. International Journal of Computer Vision, 112(2), pp.221-238, 2015.
  18. Elamri, C. and de Planque, T., Automated Neural Image Caption Generator for Visually Impaired People, 2016.
  19. Chen, J., Dong, W. and Li, M., Image Caption Generator Based On Deep Neural Networks.
  20. Karpathy, A., Joulin, A. and Fei-Fei, L.F., Deep fragment embeddings for bidirectional image sentence mapping. In Advances in neural information processing systems (pp. 1889-1897), 2014.
  21. Zhang, H., Kyaw, Z., Chang, S.F. and Chua, T.S., Visual translation embedding network for visual relation detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Vol. 2, No. 3, p. 4), 2017.
  22. Thomason, J., Venugopalan, S., Guadarrama, S., Saenko, K. and Mooney, R., Integrating language and vision to generate natural language descriptions of videos in the wild. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers (pp. 1218-1227), 2014.
  23. Ordonez, V., Kulkarni, G. and Berg, T.L., Im2text: Describing images using 1 million captioned photographs. In Advances in neural information processing systems (pp. 1143-1151), 2011.
  24. Vinyals, O., Toshev, A., Bengio, S. and Erhan, D., Show and tell: A neural image caption generator. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on (pp. 3156-3164), 2015.
  25. Karpathy, A. and Fei-Fei, L., Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3128-3137), 2015.
  26. Lu, J., Xiong, C., Parikh, D. and Socher, R., Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Vol. 6, p. 2), 2017.
  27. KPapineni, K., Roukos, S., Ward, T. and Zhu, W.J., BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, (pp. 311-318), 2002.
  28. Lin, C.Y., Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out, 2004
  29. Vedantam, R., Lawrence Zitnick, C. and Parikh, D., Cider: Consensus-based image description evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4566-4575), 2015.
  30. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., Microsoft coco: Common objects in context. In European conference on computer vision, Springer, Cham, (pp. 740-755), 2014
  31. Mikolov, T., Chen, K., Corrado, G. and Dean, J., Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013
Index Terms

Computer Science
Information Sciences

Keywords

CNN Deep learning LSTM Object detection Scene graph Visual behavior.