Call for Paper - January 2019 Edition
IJCA solicits original research papers for the January 2019 Edition. Last date of manuscript submission is December 20, 2018. Read More

Text-independent Speaker Identification in Emotional and Whispered Speech Environments

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2017
Authors:
Naresh P. Jawarkar, Raghunath S. Holambe, Tapan Kumar Basu
10.5120/ijca2017915465

Naresh P Jawarkar, Raghunath S Holambe and Tapan Kumar Basu. Text-independent Speaker Identification in Emotional and Whispered Speech Environments. International Journal of Computer Applications 175(5):18-27, October 2017. BibTeX

@article{10.5120/ijca2017915465,
	author = {Naresh P. Jawarkar and Raghunath S. Holambe and Tapan Kumar Basu},
	title = {Text-independent Speaker Identification in Emotional and Whispered Speech Environments},
	journal = {International Journal of Computer Applications},
	issue_date = {October 2017},
	volume = {175},
	number = {5},
	month = {Oct},
	year = {2017},
	issn = {0975-8887},
	pages = {18-27},
	numpages = {10},
	url = {http://www.ijcaonline.org/archives/volume175/number5/28484-2017915465},
	doi = {10.5120/ijca2017915465},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

This paper describes challenging task of closed set text-independent speaker identification in emotional and whispered speech environments. In the first phase of the work, speaker identification system is developed using neutral speech and tested using speech samples comprising of six basic emotions of anger, happiness, sadness, disgust, neutral and fear. The performance is analyzed using Mel frequency cepstral coefficients (MFCC), Line spectral frequencies (LSF), and temporal energy of subband cepstral coefficients (TESBCC) feature sets. The second phase of work involves the process of speaker identification system in whispered speech environment. The performance of the speaker identification system degrades drastically for whisper speech utterances. A new feature called temporal Teager energy based subband cepstral coefficients (TTESBCC) is proposed. The comparison of the performance of MFCC, TESBCC, weighted instantaneous frequency (WIF) and TTESBCC feature sets is done for this process. A novel classifiers fusion technique is developed and its performance is compared with that of the individual classifiers. Two databases with speech utterances of thirty nine speakers recorded in the six basic emotions and speech utterances of twenty five speakers in whispered speech are used for experimentation. The speech utterances for database were recorded in Indian language –Marathi. It is observed fusion of classifiers considerably enhances the speaker identification accuracy in both emotional and whispered speech environments.

References

  1. Furui, S. 1997. Recent advances in speaker recognition. Pattern Recognition Letters, vol. 18, No. 9, pp. 859(872.
  2. Faundez-Zanuy, M., and Monte-Moreno, E. 2005. State – of – the – art in speaker recognition. IEEE Aerospace & Electronic Systems Magazine, vol. 20, No. 5, pp. 7(12.
  3. Kinnunen, T., and Haizhou, L. 2009. An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, vol. 52, pp.12(20.
  4. Picard, R. W. 1995. Affective computing, MIT Media Lab Perceptual Computing Section Tech. Rep. 321.
  5. Wu, W., Zheng, T. F., Xu, M. X., Bao, H. J. 2006. Study on speaker verification on emotional speech. INTERSPEECH 2006. Pittsburgh, Pennsylvania, USA, pp. 2102(2105.
  6. Bao, H., Xu, M. and Zheng, T. F. 2007. Emotional attribute projection for speaker recognition on emotional Speech. INTERSPEECH 2007, Antwerp, Belgium, pp. 758(761.
  7. Li D. and Yang Y. 2009. Emotional Speech Clustering based Robust Speaker Recognition System. CISP09, Tianjin, China, pp.1(5.
  8. Shahin, I. 2009. Speaker identification in emotional environments, Iranian Journal of Electrical and Computer Engineering, vol. 8, pp. 41(46.
  9. Shahin, I. 2013. Speaker identification in emotional talking environments based on CSPHMM2s. Engineering Applications of Artificial Intelligence, vol. 26, pp.1652(1659.
  10. Koolagudi S. G., Fatima S. E., Rao, K. S. 2012. Speaker recognition in the case of emotional using transformation of speech features, Proceedings of CUBE International Information Technology Conference 2012, Pune, India, pp.118(123.
  11. Jawarkar, N., Holambe, R., & Basu, T. 2012. Text-Independent Speaker Identification in Emotional Environments: A Classifier Fusion Approach. Advances in Intelligent and Soft Computing, 133, pp.569(576.
  12. Hanilçi, C. 2013. Speaker identification from shouted speech: analysis and compensation. ICASSP 2013, Vancouver, Canada, pp. 8027(8031.
  13. Morris, R. W., and Clements, M. A. 2002. Reconstruction of speech from whispers. Medical Engg. Physics, vol. 24, no. 7(8, pp.515(520.
  14. Ito, T., Takeda, K., and Itakura, F. 2005. Analysis and recognition of whispered speech. Speech Communicatin, vol. 45, no. 2, 139(152.
  15. Fan, X., and Hansen, J. H. L. 2011. Speaker Identification within Whispered Speech Audio Streams. IEEE Trans. Audio, Speech and Language Processing, vol. 19, No. 5, pp.1408(1421.
  16. Matsuda, M. and Kasuya, H. 1999. Acoustic nature of the whisper. In Proceeding of Eurospeech. Budapest, Hungary, pp.133(136.
  17. Grimaldi, M., and Cummins, F. 2008. Speaker Identification Using Instantaneous Frequencies. IEEE Trans. Audio, Speech and Language Processing, vol. 16, no. 6, pp.1097(1111.
  18. Sarria-Paja, M., Falk, T. H., O’Shaughnessy, D. 2013. Whispered speaker verification and gender detection using weighted instantaneous frequencies. ICASSP 2013. Vancouver, Canada, pp. 7209(7213.
  19. Jawarkar, N. P., Holambe, R. S., and Basu, T. K. 2013. Speaker Identification using Whispered Speech. IEEE Conf. CSNT 2013, Gwalior, India, pp. 778(781.
  20. Wang, Jia-Ching, et al. 2015. Speaker identification with whispered speech for the access control system. IEEE Transactions on Automation Science and Engineering, vol.12, no.4, pp. 1191-1199.
  21. Jawarkar, N. P., Holambe, R. S., and Basu, T. K., 2011. Use of Fuzzy Min-Max Neural Network for Speaker Identification. IEEE ICRTIT(2011, Chennai, India, pp. 178(182.
  22. Sen, N., and Basu, T. K. 2011. Temporal Energy and Correlation Features from Nyquist Filter Bank for Text-Independent Speaker Identification. Proceeding of IEEE Students Technology Symposium, IIT Kharagpur, India, pp. 166(170.
  23. Patil, H. A., and Basu, T. K. 2008. Identifying perceptually similar languages using Teager energy based cepstrum, Engineering Letters, vol. 16 No. 1, pp.151(159.
  24. Kandali, A. B., Routray, A., Basu, T. K. 2009. Vocal emotion recognition in five native languages of Assam using new wavelet features. Int. J. Speech Tech., pp.1(13.
  25. Kaiser, Z. F. 1990. On Teagers energy algorithm and its generalization to continuous signals. Proceeding of 4th IEEE digital signal processing workshop, MOHONK, New Paltz, NY.
  26. Itakura, F. 1995. Line spectrum representation of linear predictive coefficients of speech signals. J. Acoust. Soc. Am., vol. 53, pp.537A
  27. Ho, T.H., Hull, J.J., Srihari, S. N. 1994. Decision combination in multiple classifier system. IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, pp.66(75.
  28. Kittler, J., Hatef, M., Duin, R., Mataz, J. 1998. On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, pp.226–239.
  29. Chen, K., Chi, H. 1998. A method of combining multiple probabilistic classifiers through soft competition on different feature sets. Neurocomputing, vol. 20, pp. 227(252.
  30. Doddington, G., Przybocki, M., Martin, A., Reynolds, D. 2000. The Nist speaker recognition evaluation overview, methodology, systems, results, perspective. Speech Communication, pp.225(254.
  31. Mashao, D. J., Skosan, M. 2006. Combining classifier decisions for robust speaker identification. Pattern Recognition, vol. 39, pp.147 (155.
  32. Reynolds, D. A., Rose, R. C. 1995. Robust text-independent speaker identification using Gaussian mixture models, IEEE Trans. on Speech & Audio Processing, vol. 3, pp.72(83.
  33. Nicholson S., B. Milner and S. Cox 1997. Evaluating feature set performance using the F-ratio and J-measures, Proc. of Eurospeech Conf. Speech Communication and Technology EUROSPEECH 1997, pp. 413-416.
  34. Fukunaga K. 1990. Introduction to statistical pattern recognition. Academic Press, Boston, MA.

Keywords

Speaker identification, whispered speech, temporal Teager energy based subband cepstral coefficients, emotional environment, classifier fusion.