Call for Paper - January 2024 Edition
IJCA solicits original research papers for the January 2024 Edition. Last date of manuscript submission is December 20, 2023. Read More

A Performance Analysis of Face and Speech Recognition in the Video and Audio Stream using Machine Learning Classification Techniques

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2021
Chetan Sharma, Rajdeep Singh

Chetan Sharma and Rajdeep Singh. A Performance Analysis of Face and Speech Recognition in the Video and Audio Stream using Machine Learning Classification Techniques. International Journal of Computer Applications 183(13):41-46, July 2021. BibTeX

	author = {Chetan Sharma and Rajdeep Singh},
	title = {A Performance Analysis of Face and Speech Recognition in the Video and Audio Stream using Machine Learning Classification Techniques},
	journal = {International Journal of Computer Applications},
	issue_date = {July 2021},
	volume = {183},
	number = {13},
	month = {Jul},
	year = {2021},
	issn = {0975-8887},
	pages = {41-46},
	numpages = {6},
	url = {},
	doi = {10.5120/ijca2021921447},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}


Biometric authentication is an emerging technology that utilizes biometric data for the purpose of person identification or recognition in security applications. A number of biometrics can be used in a person authentication system. Among the widely used biometrics, voice and face traits are most promising for pervasive application in every life, because they can be easily obtained using unobtrusive and user-friendly procedures. The low-cost audio and visual capture sensors on smart phones, laptops, and tablets has made the advantages of voice and face biometrics more outstanding compared with others. For quite a long time, the use of acoustic information alone has been a great success for speaker authentication applications. Meanwhile, the last decades or two also witnessed great advancement in face recognition technologies. Object detection and tracking is usually the first step in applications such as video surveillance. The static camera face recognition and tracking system's main purpose is to estimate the speed and distance parameters. We propose a general detection and tracking method for motion based on the visual system and using the image difference algorithm. Then recognize the person's voice to get feedback from the corresponding person. The process focuses on detecting people on stage and then completes the voice signal processing. We propose a new person recognition technology that uses face and voice fusion Compared to a single biometric recognition, and this technology can greatly improve the recognition speed. Development of security systems uses the Viola-Jones face recognition algorithm. The proposed method uses the Local Binary Pattern (LBP) as a function extraction technique to calculate local functions. Our project uses Mel Frequency Divergence Coefficient (MFCC) extraction technology for speech recognition. The extracted functions are used as input to the multi-SVM classifier to provide a gender to identify individuals and display the results. The new system can be used in various areas, such as identity verification and other potential commercial applications.


  1. V. Zatonskikh, Georgii I. Borzunov, Konstantin Kogos Development of Elements of Two-Level Biometric Protection Based on Face and Speech Recognition in the Video Stream Efim Department of Cryptology and Cybersecurity National Research Nuclear University MEPhI (Moscow Engineering Physics Institute) Moscow,
  2. M.A.Anusuya and S.K.Katti ,Department of Computer Science and Engineering,Sri Jaya chamarajendra College of Engineering, Mysore, India, (IJCSIS) International Journal of Computer Science and Information Security,2009.
  3. Santosh K.Gaikwad, Dr.Babasaheb Ambedkar Marathwada, Bharti W.Gawali, 2011, A Review on Speech Recognition Technique.pp1561-1569
  4. Shanthi Therese ,Chelpa Lingam, International Journal of Scientific Engineering and Technology, June 2013.,Review of Feature Extraction Techniques in Automatic Speech Recognition.
  5. Speech Recognition Technique: A Review Sanjib Das Department of Computer Science, Sukanta Mahavidyalaya, (University of North Bengal), India, International Journal of Engineering Research and Applications (IJERA) MayJun 2012.
  6. Li Deng, Jinyu Li, Jui-Ting Huang, Kaisheng Yao, Dong Yu, Frank SeideMichael L. Seltzer, Geoff Zweig, Xiaodong He, Jason Williams, Yifan Gong, and Alex Acero Microsoft Corporation, One Microsoft Way, Redmond, WA 98052, USA 2009
  7. Nidhi Desai1, Prof.Kinnal Dhameliya2, Prof.Vijayendra Desai3, International Journal of Emerging Technology and Advanced Engineering, December 2013, Feature Extraction and Classification Techniques for Speech Recognition: A Review.
  8. Li Deng and John C. Platt, Microsoft Research, One Microsoft Way, Redmond, WA, USA, November 2010, Ensemble Deep Learning for Speech Recognition.
  9. Samy Bengio and Georg Heigold, Google Inc, Mountain View, CA, USA, feb. 2007, Word Embeddings for Speech Recognition. Rubi, International Journal of Computer Science and Mobile Computing, Vol.4 Issue.5, May- 2015, pg. 1017-1024 © 2015, IJCSMC All Rights Reserved 1024
  10. Chalapathy Neti, Member, IEEE, Guillaume Gravier,, Ashutosh Garg, Audio-Visual Speech Gerasimos Potamianos, Member, IEEE, Student Member, IEEE, and Andrew W. Senior, Member, IEEE 2006, Recent Advances in the Automatic Recognition.
  11. Dandan Mo, December 4, 2012, A survey on deep learning: one small step toward AI. 11. Aalto University publication series, Foundations and Advances in Deep Learning, Kyunghyun Cho, 2014.
  12. Abboud, A. J., Sellahewa, H. and Jassim, S. A. “Quality approach for adaptive face recognition”, in Proc. Mobile Multimedia/Image Processing Security, and Applications, SPIE Vol. 7351, 73510 N, 2009.
  13. Aloysius G., “Efficient High Dimension Data Clustering using ConstraintPartitioning KMeans Algorithm,” the International Arab Journal of Information Technology, Vol. 10, No. 5, pp. 467-476, 2013.
  14. Alsaade.F and Zahrani.M, “Enhancement of Multimodal Biometric Verification Using a Combination of Fusion Methods”,5th International Conference: Sciences of Electronic, Technologies of Information and Telecommunications March 22-26, 2009.
  15. Amoli.G, Thapliyal.N, Sethi.N: Iris Preprocessing. International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 2, No. 6, pp. 301-304, 2012.
  16. Ang.R. Safavi-Naini.R, McAven.L:. Cancelable Key-based Fingerprint Templates. In C. Boyd and J. Gonzalez Nieto (Eds.), Australasian Conference on Information Security and Privacy, pp. 242-252, 2005.


SVM, KNN, LBP, Machine Learning, Viola Jones