International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 187 - Number 5 |
Year of Publication: 2025 |
Authors: Srijen Mishra, Syed Wajahat Abbas Rizvi |
![]() |
Srijen Mishra, Syed Wajahat Abbas Rizvi . AI-Driven Speech Emotion Detection: A Systematic Approach to Voice-based Sentiment Analysis. International Journal of Computer Applications. 187, 5 ( May 2025), 43-48. DOI=10.5120/ijca2025924877
This research presents a speech emotion recognition (SER) system utilizing deep learning techniques, specifically Long Short-Term Memory (LSTM) networks, to classify emotions from audio signals. The system leverages Mel-Frequency Cepstral Coefficients (MFCC) with delta and delta-delta features for robust temporal feature extraction. Two widely used emotional speech datasets, TESS and RAVDESS, were combined to enhance model generalization across diverse voices and expressions. The audio data was preprocessed to standardize sampling rates and durations, followed by MFCC feature extraction with mean pooling over time. The LSTM model, trained on the combined dataset, classifies seven emotion classes: angry, calm, disgust, fear, happy, sad, and surprise. The proposed system achieved high accuracy, demonstrating the effectiveness of temporal feature modeling in capturing emotional cues from speech. This study highlights the significance of deep learning in voice-based sentiment analysis, with potential applications in human-computer interaction, virtual assistants, and mental health monitoring.