Mel-Scaled Autoregressive (Mel-AR) Model based Voice Activity Detection using Likelihood Ratio Measure

M. Babul Islam

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Mel-Scaled Autoregressive (Mel-AR) Model based Voice Activity Detection using Likelihood Ratio Measure

by M. Babul Islam

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 182 - Number 45

Year of Publication: 2019

Authors: M. Babul Islam

10.5120/ijca2019918600

M. Babul Islam . Mel-Scaled Autoregressive (Mel-AR) Model based Voice Activity Detection using Likelihood Ratio Measure. International Journal of Computer Applications. 182, 45 ( Mar 2019), 1-4. DOI=10.5120/ijca2019918600

@article{ 10.5120/ijca2019918600,

author = { M. Babul Islam },

title = { Mel-Scaled Autoregressive (Mel-AR) Model based Voice Activity Detection using Likelihood Ratio Measure },

journal = { International Journal of Computer Applications },

issue_date = { Mar 2019 },

volume = { 182 },

number = { 45 },

month = { Mar },

year = { 2019 },

issn = { 0975-8887 },

pages = { 1-4 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume182/number45/30450-2019918600/ },

doi = { 10.5120/ijca2019918600 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T01:14:17.807321+05:30

%A M. Babul Islam

%T Mel-Scaled Autoregressive (Mel-AR) Model based Voice Activity Detection using Likelihood Ratio Measure

%J International Journal of Computer Applications

%@ 0975-8887

%V 182

%N 45

%P 1-4

%D 2019

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In this paper, a Mel-scaled AR (Mel-AR) model based VAD is presented, where likelihood ratio measure is used to classify the input speech frames as speech/non-speech segments. The Mel-AR model parameters have been estimated on the linear frequency scale from the input speech signal without applying bilinear transformation. This has been done by employing a first-order all-pass filter rather than unit delay. The performance of the proposed VAD is evaluated on Aurora-2 database by measuring FAR and FRR. The equal false rate (EFR) at the crossover point is also presented as a merit of VAD. In addition, the performance of the proposed VAD in speech recognition is verified by incorporating it with a Mel-Wiener filter for MLPC based noisy speech recognition.

References

J. Ramirez and et. al. 2004 A new KullbackLeibler VAD for speech recognition in noise. IEEE Signal Processing Letters, 11(2): 266-269.
ITU-T Recommendation G.729-Annex B. 1996. A silence compression scheme for G.729 optimized for terminals conforming to recommendation V.70.
ETSI. 1999. Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) Speech Traffic Channels. ETSI EN 301 708 Recommendation.
ETSI. 2007. Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms. ETSI ES 202 050 v1.1.5.
Asgari, M. 2008. Voice Activity Detection Using Entropy in Spectrum Domain. Telecommunication Networks and Applications Conference, 407-410.
Evanglelopulos, G. and Maragos, P. 2006. Multiband modulation energy tracking for noisy speech detection. IEEE Trans. Audio, Speech and Lang. Process, 14(6), 2024-2038.
Padrell, J., Macho, D. and Nadeu, J. 2005. Robust speech activity detection using LDA applied to FF parameters. Proceedings ICASSP’05, 1: 557-560.
Bachu, R. G. et al. 2010. Voiced/Unvoiced Decision for Speech Signals Based on Zero-Crossing Rate and Energy. Advanced Techniques in Computing Sciences and Software Engineering, K. Elleithy, Ed., ed: Springer Netherlands, 279- 282.
Fukuda, T. Ichikawa, O. and Nishimura, M. 2010. Improved voice activity detection using static harmonic features. Proceeding ICASSP’10, 4482-4485.
Li, K., et al. 2005. An improved voice activity detection using higher order statistics. IEEE Trans. Speech and Audio Process, 13(5): 965-974.
Sohn, J. et al. 1999. A statistical model-based voice activity detection. IEEE Signal Process. Letters, 16(1): 1-3.
Cho, Y. D. et al. 2001. Improved voice activity detection based on a Smoothed statistical likelihood ratio. Proceedings ICASSP’01, 2: 737-740.
Gorriz, J. M. et al. 2008. Jointly Gaussian PDF-Based Likelihood Ratio Test for Voice Activity Detection. IEEE Trans. On Audio, Speech and Lang. Process, 16(8): 1565-1578.
Fujimoto, M. et al. 2007. Noise Robust Voice Activity Detection based on Statistical Model and Parallel Non-linear Kalman Filtering. Proceedings ICASSP’07, 4: 797-800.
Bao, X. and Zhu, J. 2012. A novel voice activity detection based on phoneme recognition using statistical model, EURASIP Journal on Audio, Speech, and Music Processing, 2012(1): 1-10.
Tan, L. N.et al. 2010. Voice activity detection using harmonic frequency components in likelihood ratio test, ICASSP’10, 4466-4469.
Ramirez, J. et al. 2007. Improved Voice Activity Detection Using Contextual Multiple Hypothesis Testing for Robust Speech Recognition. IEEE transactions on audio, speech and language processing, 15(8): 2177-2189.
Gorriz, J. M. et al. 2005. An improved MO-LRT VAD based on a bispectra Gaussian model. Electronics Letters, 41(15): 877-879.
Juang, B. 1984. On the hidden Markov model and dynamic time warping for speech recognition - a unified view. AT&T Bell Lab. Tec. Journal, 63(7): 1213-1243.
Oppenheim, A. V. and Johnson, D. H. 1972. Discrete representation of signals. IEEE Proc., 60(6): 681-691.
Strube, H. W. 1980. Linear prediction on a warped frequency scale. J. Acoust. Soc. America, 68(4): 1071-1076.
Matsumoto, H., et al. 1998. An efficient Mel-LPC analysis method for speech recognition. Proc. of ICSLP’98: 1051- 1054.
Islam, M. B., et al. 2007. Mel-Wiener filter for Mel-LPC based speech recognition. IEICE Transactions on Information and Systems, E90-D (6): 935-942.
Itakura, F. and Saito, S. 1968. Analysis synthesis telephony based on the Maximum Likelihood Method. Proc. of 6th International Congress on Acoustic, C17-C20.
Hirsch, H. G. and Pearce, D. 2000. The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions. Proc. ISCA ITRW ASR 2000: 181-188.
Leonard, R. G. 1984. A database for speaker independent digit recognition. ICASSP’84, 3: 42.11.1-42.11.4.

Index Terms

Computer Science

Information Sciences

Keywords

VAD Mel-AR model Likelihood ratio Itakura-Saito distortion Aurora 2 database