CFP last date
20 May 2024
Call for Paper
June Edition
IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper
Know more
Reseach Article

A Novel Approach of Classifying and Recognizing the Audio Scenao�s Profile

Published on December 2015 by Ajay Kadam, Ramesh M. Kagalkar
National Conference on Advances in Computing
Foundation of Computer Science USA
NCAC2015 - Number 1
December 2015
Authors: Ajay Kadam, Ramesh M. Kagalkar
36c0818d-1862-4375-96f3-fea58cfba58f

Ajay Kadam, Ramesh M. Kagalkar . A Novel Approach of Classifying and Recognizing the Audio Scenao�s Profile. National Conference on Advances in Computing. NCAC2015, 1 (December 2015), 39-43.

@article{
author = { Ajay Kadam, Ramesh M. Kagalkar },
title = { A Novel Approach of Classifying and Recognizing the Audio Scenao�s Profile },
journal = { National Conference on Advances in Computing },
issue_date = { December 2015 },
volume = { NCAC2015 },
number = { 1 },
month = { December },
year = { 2015 },
issn = 0975-8887,
pages = { 39-43 },
numpages = 5,
url = { /proceedings/ncac2015/number1/23360-5019/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 National Conference on Advances in Computing
%A Ajay Kadam
%A Ramesh M. Kagalkar
%T A Novel Approach of Classifying and Recognizing the Audio Scenao�s Profile
%J National Conference on Advances in Computing
%@ 0975-8887
%V NCAC2015
%N 1
%P 39-43
%D 2015
%I International Journal of Computer Applications
Abstract

The small piece of sound clip can provide lots of information regarding the background of sound when it is captured, different types of sounds in that clip etc. Only human can recognize the sounds from clip if he knows or have heard them before. But then also there is a limitation of humans, human can recognize up to certain amounts of sound clip if more clips are there human will get confused or can't distinguish them. Hence developed system which will store different sound sample in the database and will recognize the same if comes again in system as an input. The aim of system is to identify the sounds in the given input sound clip, compare the extracted features with database samples and generate the proper text description for relevant sound with the image. Here developed system will convert the musical piece into byte array. Then time domain values will be converted into frequency domain with the help of modified FFT formulae. The chunks of 4096 bytes will be created for processing. After which system will store or match the top four values of each chunks in database for insertion or comparison of sound sample. The database is the basic need of the system. I have created database of sound over 800 samples with two distinct indoor and outdoor classes. The system has provision to add more samples in it from manual selection as well as by the recording real time samples. The system is showing the results in descriptive manner for different the different types of samples. If want to discuss the accuracy of system then it is 94% accurate for offline mode while in online mode its accuracy degrades to 50% because of noise issue.

References
  1. Namgook Cho & Eun-Kyoung Kim "Enhanced-voice activity detection using acoustic event detection & classification" in IEEE Transactions on Consumer Electronics, Vol. 57, No. 1, February 2011.
  2. Geoffroy Peeters"spectral and temporal periodicity representations of rhythm," ieee transactions on audio, speech, and language processing, vol. 19, no. 5, july 2011
  3. Jia-Min Ren, Student Member, IEEE, and Jyh-Shing Roger Jang, Member, IEEE "discovering time constrained sequential patterns for music genre classification" IEEE transactions on audio, speech, and language processing, vol. 20, no. 4, may 2012.
  4. Namgook Choo & Taeyoon Kim "Voice activation system using acoustic event detection and keyword/speaker recognition" 01/2011; DOI: 10. 1109/ICCE. 2011. 5722550
  5. G. Valenzise, L. Gerosa, M. Tagliasacchi, F. Antonacci, and A. Sarti, "Scream and gunshot detection and localization for audio-surveillance systems," in Proc. IEEE Conf. Adv. Video Signal Based Surveill. , 2007, pp. 21–26.
  6. Ajay R. Kadam and Ramesh M. Kagalkar, "Enhanced sound detection technique", c-PGCON-2015, Fourth Post Graduate Conference for Computer Engineering students at MET's Bhujbal Knowledge City, Nashik held on 13th and 14th March 2015.
  7. Ajay R. Kadam and Ramesh Kagalkar," Audio Scenarios Detection Technique", International Journal of Computer Applications (IJCA),Volume 120 June 2015 Edition ISBN : 973-93-80887-55-4.
  8. Ajay R. Kadam and Ramesh Kagalkar," A Review Paper on Predictive Sound Recognition System" CiiT International Journal of Software Engineering and Technology, June Issue 2015 on 30/6/2015.
  9. Ajay R. Kadam and Ramesh Kagalkar," Predictive Sound Recognization System ", International Journal of Advance Research in Computer Science and Management Studies, Volume 2, Issue11 ISSN(Online):2321-7782.
  10. Kyuwoong Hwang and Soo-Young Lee, Member, IEEE "Environmental Audio Scene and Activity Recognitionthrough Mobile-based Crowdsourcing" IEEE Transactions on Consumer Electronics, Vol. 58, No. 2, May 2012.
  11. R. Radhakrishnan, A. Divakaran, and P. Smaragdis, "Audio analysis for surveillance applications," in Proc. IEEE Workshop Applicat. Signal Process. Audio Acoust. , 2005, pp. 158–161.
  12. Proc. (RT-07) Rich Transcription Meeting Recognition Evaluation Plan, [Online]. Available: http://www. nist. gov/speech/tests/rt/rt2007
  13. J. Tchorz and B. Kollmeier, "A model of auditory perception as front end for automatic speech recognition," J. Acoust. Soc. Amer. , vol. 106, no. 4, pp. 2040–2050, 1999.
  14. T. Jaakkola and D. Haussler, "Exploiting generative models in discriminative classifiers," in Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 1998, vol. 11, pp. 487–493.
  15. V. Wan and S. Renals, "Speaker verification using sequence discriminant support vector machines," IEEE Trans. Speech Audio Process. , vol. 13, no. 2, pp. 203–210, Mar. 2005.
  16. T. Jebara and R. Kondor, "Bhattacharyya and expected likelihood kernels," Lecture Notes in Computer Science, vol. 2777, pp. 57–71, 2003.
  17. W. M. Campbell, D. E. Sturim, and D. A. Reynolds, "Support vector machines using GMM pervectors for speaker verification," IEEE Signal Process. Lett. , vol. 13, no. 5,pp. 308–311, May 2006.
  18. Abeer Alwan, Steven Lulich, Harish Ariskere "The role of subglottal resonances in speech processing algorithms" The Journal of the Acoustical Society of America (Impact Factor:1. 56). 04/2015;137(4):2327-2327.
  19. Fred Richardson, Douglas Reynolds, Najim Dehak "Deep Neural Network Approaches to Speaker and Language Recognition" IEEE Signal Processing Letters (Impact Factor: 1. 64). 10/2015; 22(10):1-1
  20. Jens Kreitewolf, Angela D Friederici, Katharina von Kriegstein "Hemispheric Lateralization of Linguistic Prosody Recognition in Comparison to Speech and Speaker Recognition. "NeuroImage (Impact Factor: 6. 13). 07/2014;102DOI: 10. 1016/j. neuroimage. 2014. 07. 0
  21. Kun Han, Yuxuan Wang, DeLiang Wang, William S. Woods, Ivo Merks,Tao Zhang "Learning Spectral Mapping for Speech Dereverberation and Denoising" Audio, Speech, and Language Processing, IEEE/ACM Transactions on 06/2015; 23(6):982-992. DOI: 10. 1109/TASLP. 2015. 2416653
  22. Mikolaj Kundegorski, Philip J. B. Jackson, Bartosz Zió?ko" Two-Microphone Dereverberation for Automatic Speech Recognition of Polish" Archives of Acoustics 01/2015; 39(3). DOI: 10. 2478/aoa-2014-0045
  23. Abeer Alwan, Steven Lulich, Harish Ariskere" The role of subglottal resonances in speech processing algorithms" The Journal of the Acoustical Society of America (Impact Factor: 1. 56). 04/2015; 137(4):2327-2327. DOI: 10. 1121/1. 4920497
Index Terms

Computer Science
Information Sciences

Keywords

Fingerprinting Pure Tone White Noise