CFP last date
20 May 2024
Reseach Article

Emotion Recognition from Speech using Teager based DSCC Features

Published on October 2013 by Santosh V. Chapaneri, Deepak J. Jayaswal
International Conference on Communication Technology
Foundation of Computer Science USA
ICCT - Number 1
October 2013
Authors: Santosh V. Chapaneri, Deepak J. Jayaswal
a0440eb8-d2df-4d59-99c8-7c83aea1292a

Santosh V. Chapaneri, Deepak J. Jayaswal . Emotion Recognition from Speech using Teager based DSCC Features. International Conference on Communication Technology. ICCT, 1 (October 2013), 15-20.

@article{
author = { Santosh V. Chapaneri, Deepak J. Jayaswal },
title = { Emotion Recognition from Speech using Teager based DSCC Features },
journal = { International Conference on Communication Technology },
issue_date = { October 2013 },
volume = { ICCT },
number = { 1 },
month = { October },
year = { 2013 },
issn = 0975-8887,
pages = { 15-20 },
numpages = 6,
url = { /proceedings/icct/number1/13645-1304/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference on Communication Technology
%A Santosh V. Chapaneri
%A Deepak J. Jayaswal
%T Emotion Recognition from Speech using Teager based DSCC Features
%J International Conference on Communication Technology
%@ 0975-8887
%V ICCT
%N 1
%P 15-20
%D 2013
%I International Journal of Computer Applications
Abstract

Emotion recognition from speech has emerged as an important research area in the recent past. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into seven emotional states including anger, boredom, disgust, fear, happiness, sadness and neutral. The speech samples are from Berlin emotional database and the features extracted from these utterances are Teager-based delta-spectral cepstral coefficients (T-DSCC) which are shown to perform better than MFCC. Dynamic Time Warping (DTW) and its variant Improved Features for DTW (IFDTW) is used as a classifier to classify different emotional states. Unlike in conventional DTW, we do not use the minimum distance for classification. Rather, the median distance criterion is employed for improved emotion classification. The proposed emotion recognition system gives an overall classification accuracy of 97. 52%.

References
  1. M. Ayadi, M. Kamel, and F. Karray, "Survey on Speech Emotion Recognition: Features, Classification Schemes, and Databases," Pattern Recognition, vol. 44, no. 3, pp. 572-587, Mar. 2011
  2. R. Duda, P. Hart, and D. Stork, Pattern Classification, 2nd ed. , Wiley, New York, 2001
  3. D. Ververidis, C. Kotropoulos, and I. Pitas, "Automatic emotional speech classification", IEEE Intl. Conf. Acoustics, Speech and Signal Processing, vol. 1, pp. 593-596, Montreal, May 2004
  4. Z. Xiao, E. Dellandrea, W. Dou, and L. Chen, "Features extraction and selection for emotional speech classification", IEEE Conference on Advanced Video and Signal Based Surveillance, pp. 411-416, Sep 2005
  5. T. Pao, Y. Chen, J. Yeh, and Y. Chang, "Emotion recognition and evaluation of Mandarin speech using weighted D-KNN classification", 17th Conf. on Computational Linguistics and Speech Processing, pp. 203-212, Sep 2005
  6. T. Pao, Y. Chen, J. Yeh, and P. Li, "Mandarin emotional speech recognition based on SVM and NN", Proc. 18th Intl. Conf. on Pattern Recognition (ICPR'06), vol. 1, pp. 1096-1100, Sep 2006
  7. B. Yang, and M. Lugger, "Emotion recognition from speech signals using new harmony features", Journal of Signal Processing vol. 90, no. 5, pp. 1415-1423, 2010
  8. S. Wu, T. Falk, and W. Chan, "Automatic speech emotion recognition using modulation spectral features", Speech Communication, vol. 53, no. 5, pp. 768-785, 2011
  9. T. Polzehl, A. Schmitt, F. Metze, and M. Wagner, "Anger recognition in speech using acoustic and linguistic cues", Speech Communication, vol. 53, pp. 1198-1209, 2011
  10. S. Haq, P. Jackson, and J. Edge, "Audio-visual feature selection and reduction for emotion classification", Proc. Intl. Conf. on Auditory Visual Speech Processing, pp. 185-190, 2008
  11. D. Gharavian, M. Sheikhan, A. Nazerieh, and S. Garoucy, "Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network", Neural Computing and Applications, Springer, vol. 12, no. 8, pp. 2115-2126, Nov 2012
  12. A. Batliner, S. Steidl, B. Schuller, D. Seppi, T. Vogt, J. Wagner, L. Devillers, L. Vidrascu, V. Aharonson, L. Kessous, and N. Amir, "Whodunnit - Searching for the most important feature types signalling emotion-related user states in speech", ACM Journal of Computer Speech and Language, vol . 25, no. 1, pp. 4-28, Jan 2011
  13. M. Krishna, P. Lakshmi, Y. Srinivas, and S. Devi, "Emotion recognition using dynamic time warping technique for isolated words", Intl. Journal Computer Science Issues, vol. 8, no. 5, pp. 306-309, Sep 2011
  14. E. Fersini, E. Messina, and F. Archetti, "Emotional states in judicial courtrooms: an experimental investigation", Speech Communication, vol. 54, no. 1, pp. 11-22, Jan 2012
  15. E. Ayadi, M. Kamel, and F. Karray, "Survey on speech emotion recognition: features, classification schemes, and databases", Pattern Recognition, vol. 44, no. 3, pp. 572-587, Mar 2011
  16. J. Wang, Z. Han, and S. Lung, "Speech emotion recognition system based on genetic algorithm and neural network," IEEE Intl. Conf. Image Analysis and Signal Processing, pp. 578-582, Oct 2011
  17. M. Kockmann, L. Burget, and J. C?ernocky, "Application of speaker and language identification state-of-the-art techniques for emotion recognition", Speech Commu-nication, vol. 53, no. 10, pp. 1172-1185, Nov 2011
  18. R. Lo´pez, J. Silovsky, and M. Kroul, "Enhancement of emotion detection in spoken dialogue systems by combining several information sources", Speech Communication, vol. 53, no. 10, pp. 1210-1228, Nov 2011
  19. F. Burkhardt, A. Paeschke, M. Rolfes, W. Sendlmeier, and B. Weiss, "A database of German emotional speech", Proc. Interspeech-2005, Lisbon, Portugal, pp. 1-4, Jan 2005
  20. S. Chapaneri, "Spoken digits recognition using weighted MFCC and improved features for dynamic time warping", Intl. Journal Computer Applications, vol. 40, no. 3, pp. 6-12, Feb 2012
  21. L. Rabiner, and B. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993
  22. H. Hassanein, and M. Rudko, "On the use of Discrete Cosine Transform in cepstral analysis", IEEE Trans. Acoustics, Speech and Signal Processing, vol. 32, no. 4, pp. 922-925, 1984
  23. K. Kumar, C. Kim, and R. Stern, "Delta-Spectral Cepstral Coefficients for robust speech recognition", IEEE Intl. Conf. Acoustics, Speech, and Signal Processing, pp. 4784-4787, May 2011
  24. H. Teager, and S. Teager, "A phenomenological model for vowel production in the vocal tract," Speech Science: Recent Advances, pp. 73-109, 1985
  25. T. Thomas, "A finite element model of fluid flow in the vocal tract", Journal of Computer Speech and Language, vol. 1, no. 2, pp. 131-151, Dec 1986
  26. J. Kaiser, "On a simple algorithm to calculate the 'energy' of a signal", IEEE Intl. Conf. Acoustics, Speech and Signal Processing, vol. 1, pp. 381-384, Apr 1990
  27. S. Theodoridis, and K. Koutroumbas, Pattern Recognition, 4th Ed. , Academic Press, 2008
  28. H. Sakoe, and S. Chiba, "Dynamic programming algorithm optimization for spoken word recognition", IEEE Trans. Acoustics, Speech and Signal Processing, vol. ASSP-26, Feb 1978
  29. F. Itakura, "Minimum prediction residual principle applied to speech recognition", IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-23, pp. 52-72, Feb 1975
  30. S. Chapaneri, and D. Jayaswal, "Efficient speech recognition system for isolated digits", Intl. Journal Computer Science and Engineering Technologies, vol. 4, no. 3, pp. 228-236, Mar 2013
  31. S. Salvador, and P. Chan, "FastDTW: toward accurate dynamic time warping in linear time and space", Proc. 3rd KDD Workshop on Mining Temporal and Sequential Data, pp. 70-80, Aug 2004
  32. H. Patil, and T. Basu, "Detection of bilingual twins by Teager energy based features," Proc. Intl. Conf. Signal Processing and Communication, pp. 32-36, Dec 2004
Index Terms

Computer Science
Information Sciences

Keywords

Emotion Recognition Mfcc Dscc Teager Energy Operator Dynamic Time Warping