CFP last date
20 May 2024
Reseach Article

Mel-Scaled Autoregressive (Mel-AR) Model based Voice Activity Detection using Likelihood Ratio Measure

by M. Babul Islam
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 182 - Number 45
Year of Publication: 2019
Authors: M. Babul Islam
10.5120/ijca2019918600

M. Babul Islam . Mel-Scaled Autoregressive (Mel-AR) Model based Voice Activity Detection using Likelihood Ratio Measure. International Journal of Computer Applications. 182, 45 ( Mar 2019), 1-4. DOI=10.5120/ijca2019918600

@article{ 10.5120/ijca2019918600,
author = { M. Babul Islam },
title = { Mel-Scaled Autoregressive (Mel-AR) Model based Voice Activity Detection using Likelihood Ratio Measure },
journal = { International Journal of Computer Applications },
issue_date = { Mar 2019 },
volume = { 182 },
number = { 45 },
month = { Mar },
year = { 2019 },
issn = { 0975-8887 },
pages = { 1-4 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume182/number45/30450-2019918600/ },
doi = { 10.5120/ijca2019918600 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:14:17.807321+05:30
%A M. Babul Islam
%T Mel-Scaled Autoregressive (Mel-AR) Model based Voice Activity Detection using Likelihood Ratio Measure
%J International Journal of Computer Applications
%@ 0975-8887
%V 182
%N 45
%P 1-4
%D 2019
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this paper, a Mel-scaled AR (Mel-AR) model based VAD is presented, where likelihood ratio measure is used to classify the input speech frames as speech/non-speech segments. The Mel-AR model parameters have been estimated on the linear frequency scale from the input speech signal without applying bilinear transformation. This has been done by employing a first-order all-pass filter rather than unit delay. The performance of the proposed VAD is evaluated on Aurora-2 database by measuring FAR and FRR. The equal false rate (EFR) at the crossover point is also presented as a merit of VAD. In addition, the performance of the proposed VAD in speech recognition is verified by incorporating it with a Mel-Wiener filter for MLPC based noisy speech recognition.

References
  1. J. Ramirez and et. al. 2004 A new KullbackLeibler VAD for speech recognition in noise. IEEE Signal Processing Letters, 11(2): 266-269.
  2. ITU-T Recommendation G.729-Annex B. 1996. A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70.
  3. ETSI. 1999. Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels. ETSI EN 301 708 Recommendation.
  4. ETSI. 2007. Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms. ETSI ES 202 050 v1.1.5.
  5. Asgari, M. 2008. Voice Activity Detection Using Entropy in Spectrum Domain. Telecommunication Networks and Applications Conference, 407-410.
  6. Evanglelopulos, G. and Maragos, P. 2006. Multiband modulation energy tracking for noisy speech detection. IEEE Trans. Audio, Speech and Lang. Process, 14(6), 2024-2038.
  7. Padrell, J., Macho, D. and Nadeu, J. 2005. Robust speech activity detection using LDA applied to FF parameters. Proceedings ICASSP’05, 1: 557-560.
  8. Bachu, R. G. et al. 2010. Voiced/Unvoiced Decision for Speech Signals Based on Zero-Crossing Rate and Energy. Advanced Techniques in Computing Sciences and Software Engineering, K. Elleithy, Ed., ed: Springer Netherlands, 279- 282.
  9. Fukuda, T. Ichikawa, O. and Nishimura, M. 2010. Improved voice activity detection using static harmonic features. Proceeding ICASSP’10, 4482-4485.
  10. Li, K., et al. 2005. An improved voice activity detection using higher order statistics. IEEE Trans. Speech and Audio Process, 13(5): 965-974.
  11. Sohn, J. et al. 1999. A statistical model-based voice activity detection. IEEE Signal Process. Letters, 16(1): 1-3.
  12. Cho, Y. D. et al. 2001. Improved voice activity detection based on a Smoothed statistical likelihood ratio. Proceedings ICASSP’01, 2: 737-740.
  13. Gorriz, J. M. et al. 2008. Jointly Gaussian PDF-Based Likelihood Ratio Test for Voice Activity Detection. IEEE Trans. On Audio, Speech and Lang. Process, 16(8): 1565-1578.
  14. Fujimoto, M. et al. 2007. Noise Robust Voice Activity Detection based on Statistical Model and Parallel Non-linear Kalman Filtering. Proceedings ICASSP’07, 4: 797-800.
  15. Bao, X. and Zhu, J. 2012. A novel voice activity detection based on phoneme recognition using statistical model, EURASIP Journal on Audio, Speech, and Music Processing, 2012(1): 1-10.
  16. Tan, L. N.et al. 2010. Voice activity detection using harmonic frequency components in likelihood ratio test, ICASSP’10, 4466-4469.
  17. Ramirez, J. et al. 2007. Improved Voice Activity Detection Using Contextual Multiple Hypothesis Testing for Robust Speech Recognition. IEEE transactions on audio, speech and language processing, 15(8): 2177-2189.
  18. Gorriz, J. M. et al. 2005. An improved MO-LRT VAD based on a bispectra Gaussian model. Electronics Letters, 41(15): 877-879.
  19. Juang, B. 1984. On the hidden Markov model and dynamic time warping for speech recognition - a unified view. AT&T Bell Lab. Tec. Journal, 63(7): 1213-1243.
  20. Oppenheim, A. V. and Johnson, D. H. 1972. Discrete representation of signals. IEEE Proc., 60(6): 681-691.
  21. Strube, H. W. 1980. Linear prediction on a warped frequency scale. J. Acoust. Soc. America, 68(4): 1071-1076.
  22. Matsumoto, H., et al. 1998. An efficient Mel-LPC analysis method for speech recognition. Proc. of ICSLP’98: 1051- 1054.
  23. Islam, M. B., et al. 2007. Mel-Wiener filter for Mel-LPC based speech recognition. IEICE Transactions on Information and Systems, E90-D (6): 935-942.
  24. Itakura, F. and Saito, S. 1968. Analysis synthesis telephony based on the Maximum Likelihood Method. Proc. of 6th International Congress on Acoustic, C17-C20.
  25. Hirsch, H. G. and Pearce, D. 2000. The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. Proc. ISCA ITRW ASR 2000: 181-188.
  26. Leonard, R. G. 1984. A database for speaker independent digit recognition. ICASSP’84, 3: 42.11.1-42.11.4.
Index Terms

Computer Science
Information Sciences

Keywords

VAD Mel-AR model Likelihood ratio Itakura-Saito distortion Aurora 2 database