Call for Paper - November 2023 Edition
IJCA solicits original research papers for the November 2023 Edition. Last date of manuscript submission is October 20, 2023. Read More

Therapy Bot: A Multimodal Stress/Emotion Recognition and Alleviation System

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2021
Pradeep Tiwari, A.D. Darji

Pradeep Tiwari and A D Darji. Therapy Bot: A Multimodal Stress/Emotion Recognition and Alleviation System. International Journal of Computer Applications 183(33):1-8, October 2021. BibTeX

	author = {Pradeep Tiwari and A.D. Darji},
	title = {Therapy Bot: A Multimodal Stress/Emotion Recognition and Alleviation System},
	journal = {International Journal of Computer Applications},
	issue_date = {October 2021},
	volume = {183},
	number = {33},
	month = {Oct},
	year = {2021},
	issn = {0975-8887},
	pages = {1-8},
	numpages = {8},
	url = {},
	doi = {10.5120/ijca2021921719},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}


Digitalization has brought with it technological development and new opportunities for mental health care especially during the times of a pandemic where social distancing is necessary. Hence, this paper focuses on building a therapy bot application to recognize the stress/emotion of a person and provide suitable therapy. The bot is based on Multimodal Emotion Recognition (MER), which can be conceptually perceived as the superset of Speech Emotion Recognition (SER), and Textual Emotion Recognition (TER). The challenges faced in designing the therapy bot are the extraction of the discriminative features and providing the human ability of a therapist to the bot. Hence, considering these difficulties, the features are strategically selected from speech and textual modalities. The feature extracted from the speech segment is Mel-Frequency Cepstral Coefficients (MFCC), delta MFCC and acceleration MFCC while the Term Frequency-Inverse Documentary Frequency (TF-IDF) vectorization is used for the textual segment. The Support Vector Classifier (SVM) was used for calculating the confidence of the emotions from each modality. Furthermore, these confidence outputs were fused to evaluate the MER performance of the bot. The results that were calculated in real time indicated that MER performs better over SER and TER.


  1. Alaa Ali Abd-Alrazaq, Asma Rababeh, Mohannad Alajlani, Bridgette M Bewick, and Mowafa Househ. Effectiveness and safety of using chatbots to improve mental health: systematic review and meta-analysis. Journal of medical Internet research, 22(7):e16021, 2020.
  2. Md Shad Akhtar, Ayush Kumar, Asif Ekbal, and Pushpak Bhattacharyya. A hybrid deep learning architecture for sentiment analysis. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 482–493, 2016.
  3. Mario Alvarez-Jimenez, Sarah Bendall, Peter Koval, Simon Rice, Daniela Cagliarini, Lee Valentine, Simon D’Alfonso, Christopher Miles, Penni Russon, David L Penn, et al. Horyzons trial: protocol for a randomised controlled trial of a moderated online social therapy to maintain treatment effects from first-episode psychosis services. BMJ open, 9(2):e024104, 2019.
  4. Egils Avots, Tomasz Sapi´nski, Maie Bachmann, and Dorota Kami´nska. Audiovisual emotion recognition in wild. Machine Vision and Applications, 30(5):975–985, 2019.
  5. Eileen Bendig, Benjamin Erb, Lea Schulze-Thuesing, and Harald Baumeister. The next generation: chatbots in clinical psychology and psychotherapy to foster mental health–a scoping review. Verhaltenstherapie, pages 1–13, 2019.
  6. Marco Colizzi, Antonio Lasalvia, and Mirella Ruggeri. Prevention and early intervention in youth mental health: is it time for a multidisciplinary and trans-diagnostic model for care? International journal of mental health systems, 14(1):1– 14, 2020.
  7. Min-Yuh Day and Yue-Da Lin. Deep learning for sentiment analysis on google play consumer review. In 2017 IEEE international conference on information reuse and integration (IRI), pages 382–388. IEEE, 2017.
  8. Gilly Dosovitsky, Blanca S Pineda, Nicholas C Jacobson, Cyrus Chang, and Eduardo L Bunge. Artificial intelligence chatbot for depression: Descriptive study of usage. JMIR Formative Research, 4(11):e17065, 2020.
  9. H Fayek. Speech processing for machine learning: filter banks, mel-frequency cepstral coefficients (mfccs) and what’s in-between, 21 april 2016, 2018.
  10. Abhishek Gera and Arnab Bhattacharya. Emotion recognition from audio and visual data using f-score based fusion. In Proceedings of the 1st IKDD Conference on Data Sciences, pages 1–10, 2014.
  11. Miguel Grinberg. Flask web development: developing web applications with python. ” O’Reilly Media, Inc.”, 2018.
  12. Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Gang Wang, Jianfei Cai, et al. Recent advances in convolutional neural networks. Pattern Recognition, 77:354–377, 2018.
  13. Song Guanjun, Zhang Shudong, andWei Feigao. Research on audio and video bimodal emotion recognition fusion framework. Computer Engineering and Applications, pages 1–9, 2019.
  14. Sanaul Haq and Philip JB Jackson. Multimodal emotion recognition. In Machine audition: principles, algorithms and systems, pages 398–423. IGI Global, 2011. Haiping Huang, Zhenchao Hu,WenmingWang, and MinWu. Multimodal emotion recognition based on ensemble convolutional neural network. IEEE Access, 8:3265–3271, 2019.
  15. Thorsten Joachims. Making large-scale svm learning practical. Technical report, Technical report, 1998.
  16. Takeshi Kamita, Tatsuya Ito, Atsuko Matsumoto, Tsunetsugu Munakata, and Tomoo Inoue. A chatbot system for mental healthcare based on sat counseling method. Mobile Information Systems, 2019, 2019.
  17. Jonghwa Kim and Elisabeth Andr´e. Emotion recognition based on physiological changes in music listening. IEEE transactions on pattern analysis and machine intelligence, 30(12):2067–2083, 2008.
  18. Yelin Kim and Emily Mower Provost. Isla: Temporal segmentation and labeling for audio-visual emotion recognition. IEEE Transactions on affective computing, 10(2):196–208, 2017.
  19. Sandeep P Kishore, Evan Blank, David J Heller, Amisha Patel, Alexander Peters, Matthew Price, Mahesh Vidula, Valentin Fuster, Oyere Onuma, Mark D Huffman, et al. Modernizing the world health organization list of essential medicines for preventing and controlling cardiovascular diseases. Journal of the American College of Cardiology, 71(5):564–574, 2018.
  20. Ruijun Liu, Yuqian Shi, Changjiang Ji, and Ming Jia. A survey of sentiment analysis based on transfer learning. IEEE Access, 7:85401–85412, 2019.
  21. Kien Hoa Ly, Ann-Marie Ly, and Gerhard Andersson. A fully automated conversational agent for promoting mental wellbeing: a pilot rct using mixed methods. Internet interventions, 10:39–46, 2017.
  22. Muharram Mansoorizadeh and Nasrollah Moghaddam Charkari. Multimodal information fusion application to human emotion recognition from face and speech. Multimedia Tools and Applications, 49(2):277–297, 2010.
  23. Youssef Mroueh, Etienne Marcheret, and Vaibhava Goel. Deep multimodal learning for audio-visual speech recognition. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2130–2134. IEEE, 2015.
  24. Fatemeh Noroozi, Dorota Kaminska, Tomasz Sapinski, and Gholamreza Anbarjafari. Supervised vocal-based emotion recognition using multiclass support vector machine, random forests, and adaboost. Journal of the Audio Engineering Society, 65(7/8):562–572, 2017.
  25. Panagiotis C Petrantonakis and Leontios J Hadjileontiadis. A novel emotion elicitation index using frontal brain asymmetry for enhanced eeg-based emotion recognition. IEEE Transactions on information technology in biomedicine, 15(5):737– 746, 2011.
  26. Jianyu Que, Jiahui Deng Le Shi, Jiajia Liu, Li Zhang, Suying Wu, Yimiao Gong, Weizhen Huang, Kai Yuan, Wei Yan, Yankun Sun, et al. Psychological impact of the covid-19 pandemic on healthcare workers: a cross-sectional study in china. General psychiatry, 33(3), 2020.
  27. Bhuvan Sharma, Harshita Puri, and Deepika Rawat. Digital psychiatry-curbing depression using therapy chatbot and depression analysis. In 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), pages 627–631. IEEE, 2018.
  28. Mohammad Soleymani, Maja Pantic, and Thierry Pun. Multimodal emotion recognition in response to videos. IEEE transactions on affective computing, 3(2):211–223, 2011.
  29. Tengfei Song, Wenming Zheng, Cheng Lu, Yuan Zong, Xilei Zhang, and Zhen Cui. Mped: A multi-modal physiological emotion database for discrete emotion recognition. IEEE Access, 7:12177–12191, 2019.
  30. Ivona Tautkute, Tomasz Trzcinski, and Adam Bielski. I know how you feel: Emotion recognition with facial landmarks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1878–1880, 2018.
  31. Kannan Venkataramanan and Haresh Rengaraj Rajamohan. Emotion recognition from speech. arXiv preprint arXiv:1912.10458, 2019.
  32. Milton L Wainberg, Pamela Scorza, James M Shultz, Liat Helpman, Jennifer J Mootz, Karen A Johnson, Yuval Neria, Jean-Marie E Bradford, Maria A Oquendo, and Melissa R Arbuckle. Challenges and opportunities in global mental health: a research-to-practice perspective. Current psychiatry reports, 19(5):28, 2017.
  33. Yongjin Wang, Ling Guan, and Anastasios N Venetsanopoulos. Kernel cross-modal factor analysis for information fusion with application to bimodal emotion recognition. IEEE Transactions on Multimedia, 14(3):597–607, 2012.
  34. GuanghuaWu, Guangyuan Liu, and Min Hao. The analysis of emotion recognition from gsr based on pso. In 2010 International symposium on intelligence information processing and trusted computing, pages 360–363. IEEE, 2010.
  35. Zunjing Wu and Zhigang Cao. Improved mfcc-based feature for robust speaker identification. Tsinghua Science & Technology, 10(2):158–161, 2005.
  36. Jiaqi Xiong, Orly Lipsitz, Flora Nasri, Leanna MW Lui, Hartej Gill, Lee Phan, David Chen-Li, Michelle Iacobucci, Roger Ho, Amna Majeed, et al. Impact of covid-19 pandemic on mental health in the general population: A systematic review. Journal of affective disorders, 2020.
  37. Jingjie Yan, Guanming Lu, Xiaodong Bai, Haibo Li, Ning Sun, and Ruiyu Liang. A novel supervised bimodal emotion recognition approach based on facial expression and body gesture. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 101(11):2003– 2006, 2018.
  38. Sara Zhalehpour, Onur Onder, Zahid Akhtar, and Cigdem Eroglu Erdem. Baum-1: A spontaneous audio-visual face database of affective and mental states. IEEE Transactions on Affective Computing, 8(3):300–313, 2016.
  39. Shiqing Zhang, Shiliang Zhang, Tiejun Huang, andWen Gao. Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching. IEEE Transactions on Multimedia, 20(6):1576–1590, 2017.
  40. Zixing Zhang, Eduardo Coutinho, Jun Deng, and Bj¨orn Schuller. Cooperative learning and its application to emotion recognition from speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(1):115–126, 2014.


Therapy Bot; Mental health; Emotion Recognition; MFCC; TFIDF; Speech processing