Audio Replay Attack Detection in Automated Speaker Verification

Pooja Anjee; Shubham Ghosh; Shrirag Kodoor; Rajashree Shettar

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Audio Replay Attack Detection in Automated Speaker Verification

by Pooja Anjee, Shubham Ghosh, Shrirag Kodoor, Rajashree Shettar

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 179 - Number 41

Year of Publication: 2018

Authors: Pooja Anjee, Shubham Ghosh, Shrirag Kodoor, Rajashree Shettar

10.5120/ijca2018916986

Pooja Anjee, Shubham Ghosh, Shrirag Kodoor, Rajashree Shettar . Audio Replay Attack Detection in Automated Speaker Verification. International Journal of Computer Applications. 179, 41 ( May 2018), 44-48. DOI=10.5120/ijca2018916986

@article{ 10.5120/ijca2018916986,

author = { Pooja Anjee, Shubham Ghosh, Shrirag Kodoor, Rajashree Shettar },

title = { Audio Replay Attack Detection in Automated Speaker Verification },

journal = { International Journal of Computer Applications },

issue_date = { May 2018 },

volume = { 179 },

number = { 41 },

month = { May },

year = { 2018 },

issn = { 0975-8887 },

pages = { 44-48 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume179/number41/29359-2018916986/ },

doi = { 10.5120/ijca2018916986 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:58:06.642117+05:30

%A Pooja Anjee

%A Shubham Ghosh

%A Shrirag Kodoor

%A Rajashree Shettar

%T Audio Replay Attack Detection in Automated Speaker Verification

%J International Journal of Computer Applications

%@ 0975-8887

%V 179

%N 41

%P 44-48

%D 2018

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Automated Speaker Verification (ASV) systems are extensively used for authentication and verification measures. Countermeasures are developed for ASV systems to protect it from audio replay attacks. This paper describes the ASVspoof2017 database, conceptual analysis of various algorithms and their classification followed by prediction of results. Feature extraction is based on the recently introduced Constant Q Transform (CQT), a perceptually mapped frequency-time analysis tool mainly used with audio samples. The training dataset comprises of 1508 genuine samples and 1508 spoof samples. A training accuracy of 84.4% is achieved for variations of boosted decision tree. Parameters such as learning rate, number of learners and splits were empirically optimized. LogitBoost was found to have outperformed AdaBoost in all metrics. Furthermore, an implementation of a single hidden layer neural network achieved a training accuracy of 92.1%. A comparison of the algorithms revealed that while the neural network achieved a higher overall training accuracy, it had a lower True Negative Rate than LogitBoost. Overall, the paper describes a generalized system capable to detection of replay attacks in known and unknown conditions.

References

Z. Wu, N. Evans, T. Kinnunen, J. Yamagishi, F. Alegre, and H. Li, “Spoofing and countermeasures for speaker verification: A survey,” Speech Communication, vol. 66, no. 0, pp. 130 – 153, 2015.
D. Paul, M. Sahidullah, and G. Saha, “Generalization of spoofing countermeasures: A case study with ASVspoof 2015 and BTAS 2016 corpora,” in Proc. ICASSP, 2016.
Z. Wu, T. Kinnunen, N. Evans, J. Yamagishi, C. Hanilci, M. Sahidullah, A. Sizov, K. A. Lee, M. Lee, H. Delgado: “The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection,” in INTERSPEECH, Sweden, 2017 (pending).
Md. Sahidullah, T. Kinnunen, and C. Hanilci, “A comparison of features for synthetic speech detection,” in INTERSPEECH, Sweden, 2015, pp. 2087–2091.
H. Delgado, M. Todisco, M. Sahidullah, A. Sarkar, N. Evans, T. Kinnunen, and Z.-H. Tan, “Further optimisations of constant Q cepstral processing for integrated utterance verification and text- dependent speaker verification,” in Proc. IEEE Spoken Language Technology Workshop (SLT), 2016, pp. 179–185.
X. Xiao, X. Tian, S. Du, H. Xu, E. Chng, and H. Li, “Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge,” in INTERSPEECH, Sweden, 2015
M. Todisco, H. Delgado, and N. Evans, “A new feature for automatic speaker verification anti-spoofing: Constant Q cepstral coefficients,” in Speaker Odyssey Workshop, Bilbao, Spain, 2016.
SoX, audio manipulation tool, (accessed Jan 25, 2015). [Online]. Available: http://sox.sourceforge.net/
Jerome Friedman, Trevor Hastie and Robert Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics 28(2), 2000. 337–407.
Freund., Schapire.:"A decision-theoretic generalization of on-line learning and an application to boosting". Journal of Computer and System Sciences. 55: 119
Santaji Ghorpade, Jayshree Ghorpade and Shamla Mantri:“Pattern recognition using neural networks”.International Journal of Computer Science & Information Technology (IJCSIT), Vol 2, No 6, December 2010.
F. Alegre, A. Amehraye, and N. Evans, “Spoofing countermeasures to protect automatic speaker verification from voice conversion,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, May 2013, pp. 3068–3072.
F. Alegre, A. Janicki, and N. Evans, “Re-assessing the threat of replay spoofing attacks against automatic speaker verification,” in Proc. Int. Conf. of the Biometrics Special Interest Group (BIOSIG), 2014.
J. Brown, “Calculation of a constant Q spectral transform,” The Journal of the Acoustical Society of America, vol. 89, no. 1, pp. 425–434, 1991.
N. Evans, T. Kinnunen, and J. Yamagishi, “Spoofing and countermeasures for automatic speaker verification,” in Proc. INTERSPEECH, Lyon, France, 2013.
Y. Qian, N. Chen, and K. Yu, “Deep features for automatic spoofing detection,” Speech Communication, vol. 85, pp. 43–52, 2016.
S. Novoselov, A. Kozlov, G. Lavrentyeva, K. Simonchik, and V. Shchemelinin, “STC anti-spoofing systems for the asvspoof 2015 challenge,” in INTERSPEECH, 2015.
T. B. Patel and H. A. Patil, “Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech,” in INTERSPEECH, 2015, pp. 2062–2066.
Villalba E., Lleida E., “Speaker verification performance degradation against spoofing and tampering attacks”, in Proc. of the FALA 2010 Workshop, pp. 131–134, 2010.
Z. Wu, S. Gao, E. Chng, and H. Li, “A study on replay attack and anti-spoofing for text-dependent speaker verification,” in Proc. APSIPA, 2014, pp. 1–5.
J. Galka, M. Grzywacz, and R. Samborski, “Playback attack detection for text-dependent speaker verification over telephone channels,” Speech Comm., vol. 67, pp. 143–153, 2015.

Index Terms

Computer Science

Information Sciences

Keywords

Replay attack detection Automated speaker verification Classification of Speech Samples