Text-Independent Speaker Identification using Audio Looping with Margin based Loss Functions

Elliot Q.C. Garcia; Nic´eias Silva Vilela; K´atia Pires Nascimento do Sacramento; Tiago A.E. Ferreira

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Text-Independent Speaker Identification using Audio Looping with Margin based Loss Functions

by Elliot Q.C. Garcia, Nic´eias Silva Vilela, K´atia Pires Nascimento do Sacramento, Tiago A.E. Ferreira

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 187 - Number 59

Year of Publication: 2025

Authors: Elliot Q.C. Garcia, Nic´eias Silva Vilela, K´atia Pires Nascimento do Sacramento, Tiago A.E. Ferreira

10.5120/ijca2025925961

Elliot Q.C. Garcia, Nic´eias Silva Vilela, K´atia Pires Nascimento do Sacramento, Tiago A.E. Ferreira . Text-Independent Speaker Identification using Audio Looping with Margin based Loss Functions. International Journal of Computer Applications. 187, 59 ( Nov 2025), 1-8. DOI=10.5120/ijca2025925961

@article{ 10.5120/ijca2025925961,

author = { Elliot Q.C. Garcia, Nic´eias Silva Vilela, K´atia Pires Nascimento do Sacramento, Tiago A.E. Ferreira },

title = { Text-Independent Speaker Identification using Audio Looping with Margin based Loss Functions },

journal = { International Journal of Computer Applications },

issue_date = { Nov 2025 },

volume = { 187 },

number = { 59 },

month = { Nov },

year = { 2025 },

issn = { 0975-8887 },

pages = { 1-8 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume187/number59/text-independent-speaker-identification-using-audio-looping-with-margin-based-loss-functions/ },

doi = { 10.5120/ijca2025925961 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2025-11-18T21:11:27+05:30

%A Elliot Q.C. Garcia

%A Nic´eias Silva Vilela

%A K´atia Pires Nascimento do Sacramento

%A Tiago A.E. Ferreira

%T Text-Independent Speaker Identification using Audio Looping with Margin based Loss Functions

%J International Journal of Computer Applications

%@ 0975-8887

%V 187

%N 59

%P 1-8

%D 2025

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Speaker identification has become a crucial component in various applications, including security systems, virtual assistants, and personalized user experiences. This paper investigates the effectiveness of CosFace Loss and ArcFace Loss for text-independent speaker identification using a Convolutional Neural Network architecture based on the VGG16 model, modified to accommodate mel spectrogram inputs of variable sizes generated from the Voxceleb1 dataset. The approach involves implementing both loss functions to analyze their effects on model accuracy and robustness, where the Softmax loss function served as a comparative baseline. Additionally, the study examines how the sizes of mel spectrograms and their varying time lengths influence model performance using 3 seconds as the baseline, with 10 seconds being the maximum time length. The experimental results demonstrate superior identification accuracy compared to traditional Softmax loss in the model that was used. Furthermore, the paper discusses the implications of these findings for future research.

References

Hadi Abdullah, Kevin Warren, Vincent Bindschaedler, Nicolas Papernot, and Patrick Traynor. Sok: The faults in our asrs: An overview of attacks against automatic speech recognition and speaker identification systems. In 2021 IEEE Symposium on Security and Privacy (SP), pages 730–747, 2021.
Nguyen Nang An, Nguyen Quang Thanh, and Yanbing Liu. Deep cnns with self-attention for speaker identification. IEEE access, 7:85327–85337, 2019.
Prashant Anand, Ajeet Kumar Singh, Siddharth Srivastava, and Brejesh Lall. Few shot speaker recognition using deep neural networks, 2019.
Abdul Malik Badshah, Nasir Rahim, Noor Ullah, Jamil Ahmad, Khan Muhammad, Mi Young Lee, Soonil Kwon, and Sung Wook Baik. Deep features-based speech emotion recognition for smart affective services. Multimedia Tools and Applications, 78:5571–5589, 2019.
Joseph P Campbell. Speaker recognition: A tutorial. Proceedings of the IEEE, 85(9):1437–1462, 2002.
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4690–4699, 2019.
Sadaoki Furui. Speaker recognition in smart environments. In Human-centric interfaces for ambient intelligence, pages 163–184. Elsevier, 2010.
Miao Guo, Jiaxiong Yang, and Shu Gao. Speaker recognition method for short utterance. In Journal of physics: conference series, volume 1827, page 012158. IOP Publishing, 2021.
Mahdi Hajibabaei and Dengxin Dai. Unified hypersphere embedding for speaker recognition. arXiv preprint arXiv:1807.08312, 2018.
Biing Hwang Juang, M Mohan Sondhi, and Lawrence R Rabiner. Digital speech processing. 2003.
SM Kamruzzaman, ANM Karim, Md Saiful Islam, and Md Emdadul Haque. Speaker identification using mfcc-domain support vector machine. arXiv preprint arXiv:1009.4972, 2010.
Wondimu Lambamo, Ramasamy Srinivasagan, and Worku Jifara. Analyzing noise robustness of cochleogram and mel spectrogram features in deep learning based speaker recognition. applied sciences, 13(1):569, 2022.
Yuke Lin, Xiaoyi Qin, and Ming Li. Cross-domain arcface: Learnging robust speaker representation under the far-field speaker verification. In Proc. FFSVC 2022, pages 6–9, 2022.
Xugang Lu and Jianwu Dang. An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification. Speech communication, 50(4):312–322, 2008.
Andr´e Filipe da Silva Magalh˜aes et al. Voice recognition of users for virtual assistant in industrial environments. Master’s thesis, 2021.
DAMIAN A MORANDI. Effect of pitch modification on the voice identification of the speakers.
Arsha Nagrani, Joon Son Chung, and Andrew Zisserman. Voxceleb: a large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612, 2017.
Kristiawan Nugroho, Edi Noersasongko, et al. Enhanced indonesian ethnic speaker recognition using data augmentation deep neural network. Journal of King Saud University-Computer and Information Sciences, 34(7):4375–4384, 2022.
Zakariya Qawaqneh, Arafat Abu Mallouh, and Buket D Barkana. Deep neural network framework and transformed mfccs for speaker’s age and gender classification. Knowledge-Based Systems, 115:5–14, 2017.
Banala Saritha, Mohammad Azharuddin Laskar, Anish Monsley Kirupakaran, Rabul Hussain Laskar, Madhuchhanda Choudhury, and Nirupam Shome. Deep learning-based end-to-end speaker identification using time–frequency representation of speech signal. Circuits, Systems, and Signal Processing, 43(3):1839–1861, 2024.
M Sharif-Noughabi, S Razavi, and S Mohamadzadeh. Improving the performance of speaker recognition system using optimized vgg convolutional neural network and data augmentation. International Journal of Engineering, 38(10):2414–2425, 2025.
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
MS Sinith, Anoop Salim, K Gowri Sankar, KV Sandeep Narayanan, and Vishnu Soman. A novel method for text-independent speaker identification using mfcc and gmm. In 2010 International Conference on Audio, Language and Image Processing, pages 292–296. IEEE, 2010.
Yuwu Tang, Ying Hu, Liang He, and Hao Huang. A bimodal network based on audio–text-interactional-attention with arcface loss for speech emotion recognition. Speech Communication, 143:21–32, 2022.
Feng Wang, Jian Cheng, Weiyang Liu, and Haijun Liu. Additive margin softmax for face verification. IEEE Signal Processing Letters, 25(7):926–930, 2018.
Hao Wang, Yitong Wang, Zheng Zhou, Xing Ji, Dihong Gong, Jingchao Zhou, Zhifeng Li, and Wei Liu. Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5265–5274, 2018.
Sarthak Yadav and Atul Rai. Learning discriminative features for speaker identification and verification. In Interspeech, pages 2237–2241, 2018.
Youssef Zouhir, Mohamed Zarka, Ka¨ıs Ouni, and Lilia El Amraoui. Power wavelet cepstral coefficients (pwcc): An accurate auditory model-based feature extraction method for robust speaker recognition. IEEE Access, 2025.

Index Terms

Computer Science

Information Sciences

Keywords

Speaker Identification Loss Functions Data Augmentation Mel Spectrograms