A Performance Analysis of Face and Speech Recognition in the Video and Audio Stream using Machine Learning Classification Techniques

Chetan Sharma; Rajdeep Singh

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

A Performance Analysis of Face and Speech Recognition in the Video and Audio Stream using Machine Learning Classification Techniques

by Chetan Sharma, Rajdeep Singh

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 183 - Number 13

Year of Publication: 2021

Authors: Chetan Sharma, Rajdeep Singh

10.5120/ijca2021921447

Chetan Sharma, Rajdeep Singh . A Performance Analysis of Face and Speech Recognition in the Video and Audio Stream using Machine Learning Classification Techniques. International Journal of Computer Applications. 183, 13 ( Jul 2021), 41-46. DOI=10.5120/ijca2021921447

@article{ 10.5120/ijca2021921447,

author = { Chetan Sharma, Rajdeep Singh },

title = { A Performance Analysis of Face and Speech Recognition in the Video and Audio Stream using Machine Learning Classification Techniques },

journal = { International Journal of Computer Applications },

issue_date = { Jul 2021 },

volume = { 183 },

number = { 13 },

month = { Jul },

year = { 2021 },

issn = { 0975-8887 },

pages = { 41-46 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume183/number13/31990-2021921447/ },

doi = { 10.5120/ijca2021921447 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T01:16:44.221465+05:30

%A Chetan Sharma

%A Rajdeep Singh

%T A Performance Analysis of Face and Speech Recognition in the Video and Audio Stream using Machine Learning Classification Techniques

%J International Journal of Computer Applications

%@ 0975-8887

%V 183

%N 13

%P 41-46

%D 2021

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Biometric authentication is an emerging technology that utilizes biometric data for the purpose of person identification or recognition in security applications. A number of biometrics can be used in a person authentication system. Among the widely used biometrics, voice and face traits are most promising for pervasive application in every life, because they can be easily obtained using unobtrusive and user-friendly procedures. The low-cost audio and visual capture sensors on smart phones, laptops, and tablets has made the advantages of voice and face biometrics more outstanding compared with others. For quite a long time, the use of acoustic information alone has been a great success for speaker authentication applications. Meanwhile, the last decades or two also witnessed great advancement in face recognition technologies. Object detection and tracking is usually the first step in applications such as video surveillance. The static camera face recognition and tracking system's main purpose is to estimate the speed and distance parameters. We propose a general detection and tracking method for motion based on the visual system and using the image difference algorithm. Then recognize the person's voice to get feedback from the corresponding person. The process focuses on detecting people on stage and then completes the voice signal processing. We propose a new person recognition technology that uses face and voice fusion Compared to a single biometric recognition, and this technology can greatly improve the recognition speed. Development of security systems uses the Viola-Jones face recognition algorithm. The proposed method uses the Local Binary Pattern (LBP) as a function extraction technique to calculate local functions. Our project uses Mel Frequency Divergence Coefficient (MFCC) extraction technology for speech recognition. The extracted functions are used as input to the multi-SVM classifier to provide a gender to identify individuals and display the results. The new system can be used in various areas, such as identity verification and other potential commercial applications.

References

V. Zatonskikh, Georgii I. Borzunov, Konstantin Kogos Development of Elements of Two-Level Biometric Protection Based on Face and Speech Recognition in the Video Stream Efim Department of Cryptology and Cybersecurity National Research Nuclear University MEPhI (Moscow Engineering Physics Institute) Moscow,
M.A.Anusuya and S.K.Katti ,Department of Computer Science and Engineering,Sri Jaya chamarajendra College of Engineering, Mysore, India, (IJCSIS) International Journal of Computer Science and Information Security,2009.
Santosh K.Gaikwad, Dr.Babasaheb Ambedkar Marathwada, Bharti W.Gawali, 2011, A Review on Speech Recognition Technique.pp1561-1569
Shanthi Therese ,Chelpa Lingam, International Journal of Scientific Engineering and Technology, June 2013.,Review of Feature Extraction Techniques in Automatic Speech Recognition.
Speech Recognition Technique: A Review Sanjib Das Department of Computer Science, Sukanta Mahavidyalaya, (University of North Bengal), India, International Journal of Engineering Research and Applications (IJERA) MayJun 2012.
Li Deng, Jinyu Li, Jui-Ting Huang, Kaisheng Yao, Dong Yu, Frank SeideMichael L. Seltzer, Geoff Zweig, Xiaodong He, Jason Williams, Yifan Gong, and Alex Acero Microsoft Corporation, One Microsoft Way, Redmond, WA 98052, USA 2009
Nidhi Desai1, Prof.Kinnal Dhameliya2, Prof.Vijayendra Desai3, International Journal of Emerging Technology and Advanced Engineering, December 2013, Feature Extraction and Classification Techniques for Speech Recognition: A Review.
Li Deng and John C. Platt, Microsoft Research, One Microsoft Way, Redmond, WA, USA, November 2010, Ensemble Deep Learning for Speech Recognition.
Samy Bengio and Georg Heigold, Google Inc, Mountain View, CA, USA, feb. 2007, Word Embeddings for Speech Recognition. Rubi, International Journal of Computer Science and Mobile Computing, Vol.4 Issue.5, May- 2015, pg. 1017-1024 © 2015, IJCSMC All Rights Reserved 1024
Chalapathy Neti, Member, IEEE, Guillaume Gravier,, Ashutosh Garg, Audio-Visual Speech Gerasimos Potamianos, Member, IEEE, Student Member, IEEE, and Andrew W. Senior, Member, IEEE 2006, Recent Advances in the Automatic Recognition.
Dandan Mo, December 4, 2012, A survey on deep learning: one small step toward AI. 11. Aalto University publication series, Foundations and Advances in Deep Learning, Kyunghyun Cho, 2014.
Abboud, A. J., Sellahewa, H. and Jassim, S. A. “Quality approach for adaptive face recognition”, in Proc. Mobile Multimedia/Image Processing Security, and Applications, SPIE Vol. 7351, 73510 N, 2009.
Aloysius G., “Efficient High Dimension Data Clustering using ConstraintPartitioning KMeans Algorithm,” the International Arab Journal of Information Technology, Vol. 10, No. 5, pp. 467-476, 2013.
Alsaade.F and Zahrani.M, “Enhancement of Multimodal Biometric Verification Using a Combination of Fusion Methods”,5th International Conference: Sciences of Electronic, Technologies of Information and Telecommunications March 22-26, 2009.
Amoli.G, Thapliyal.N, Sethi.N: Iris Preprocessing. International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 2, No. 6, pp. 301-304, 2012.
Ang.R. Safavi-Naini.R, McAven.L:. Cancelable Key-based Fingerprint Templates. In C. Boyd and J. Gonzalez Nieto (Eds.), Australasian Conference on Information Security and Privacy, pp. 242-252, 2005.

Index Terms

Computer Science

Information Sciences

Keywords

SVM KNN LBP Machine Learning Viola Jones