CFP last date
20 May 2024
Reseach Article

Assessing the Effectiveness of Various Text Classification Algorithms in Customer Complaint Classification: An Informative Resource for Data Scientists and Data Analysts

by Yehia Helmy, Merna Ashraf, Laila Abdelhamid
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 2
Year of Publication: 2024
Authors: Yehia Helmy, Merna Ashraf, Laila Abdelhamid
10.5120/ijca2024923346

Yehia Helmy, Merna Ashraf, Laila Abdelhamid . Assessing the Effectiveness of Various Text Classification Algorithms in Customer Complaint Classification: An Informative Resource for Data Scientists and Data Analysts. International Journal of Computer Applications. 186, 2 ( Jan 2024), 8-16. DOI=10.5120/ijca2024923346

@article{ 10.5120/ijca2024923346,
author = { Yehia Helmy, Merna Ashraf, Laila Abdelhamid },
title = { Assessing the Effectiveness of Various Text Classification Algorithms in Customer Complaint Classification: An Informative Resource for Data Scientists and Data Analysts },
journal = { International Journal of Computer Applications },
issue_date = { Jan 2024 },
volume = { 186 },
number = { 2 },
month = { Jan },
year = { 2024 },
issn = { 0975-8887 },
pages = { 8-16 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume186/number2/33043-2024923346/ },
doi = { 10.5120/ijca2024923346 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:29:30.147398+05:30
%A Yehia Helmy
%A Merna Ashraf
%A Laila Abdelhamid
%T Assessing the Effectiveness of Various Text Classification Algorithms in Customer Complaint Classification: An Informative Resource for Data Scientists and Data Analysts
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 2
%P 8-16
%D 2024
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Due to the numerous issues or challenges that aren't always within the company's control. Customers became unhappy. Customer complaint is the method by which they convey their dissatisfaction. Due to the rapid advancement of technology and the various convenient channels available for customers to voice their complaints, including email, web, and chatbots, online complaints have experienced exponential growth. As a result, classifying these complaints under the pertinent issue in time became a difficult task. Selecting the appropriate classification model and Fitting it with the proper training and testing ratios is a crucial topic that always faces researchers. This paper implements and compares the performance of six text classification machine learning algorithms used in multi-classification (SVM, KNN, NB, DT, RF, and GB) under two types of sampling (random and stratified) with the use of various data splitting ratios 50:50,80:20, 60:40, 70:30, and 90:10 on a Complaint Dataset. This paper aims to provide a roadmap for researchers working in the text classification field that helps them select the optimum classification model and splitting ratio. The results demonstrate that DT with an accuracy of 99%, F1-measure of 99%, and runtime of 1 second outperformed all other algorithms. And that the most suitable splitting ratio that fits most algorithms and acts as a secure base to work with is 80:20. It also indicates that using stratified sampling in multi-class text classification produces better results than random sampling.

References
  1. Ali, M., et al. (2019). Classifying Arabic farmers’ complaints based on crops and diseases using machine learning approaches. Recent Trends in Image Processing and Pattern Recognition: Second International Conference, RTIP2R 2018, Solapur, India, December 21–22, 2018, Revised Selected Papers, Part III 2, Springer.
  2. Anwar, M. T., et al. (2021). "Automatic Complaints Categorization Using Random Forest and Gradient Boosting." 3(1): 210106.
  3. Arusada, M. D. N., et al. (2017). Training data optimization strategy for multiclass text classification. 2017 5th International Conference on Information and Communication Technology (ICoIC7), IEEE.
  4. Bazzan, J., et al. (2023). "An Information Management Model for Addressing Residents’ Complaints through Artificial Intelligence Techniques." 13(3): 737.
  5. BOZYİĞİT, F., et al. (2022). "Categorization of customer complaints in food industry using machine learning approaches." 5(1): 85-91.
  6. Choi, C. (2018). Predicting customer complaints in mobile telecom industry using machine learning algorithms, Purdue University.
  7. Endut, N., et al. (2022). "A Systematic Literature Review on Multi-Label Classification based on Machine Learning Algorithms." 11(2): 658.
  8. Goncarovs, P. (2019). Active learning svm classification algorithm for complaints management process automatization. 2019 60th International Scientific Conference on Information Technology and Management Science of Riga Technical University (ITMS), IEEE.
  9. HaCohen-Kerner, Y., et al. (2019). "Automatic classification of complaint letters according to service provider categories." 56(6): 102102.
  10. Hasan, T., et al. (2020). Machine learning based automatic classification of customer sentiment. 2020 23rd International Conference on Computer and Information Technology (ICCIT), IEEE.
  11. Kadhim, A. I. J. A. I. R. (2019). "Survey on supervised machine learning techniques for automatic text classification." 52(1): 273-292.
  12. Kalra, V. and R. Aggarwal (2017). Importance of Text Data Preprocessing & Implementation in RapidMiner. ICITKM.
  13. Li, L. and W. J. T. v. Li (2019). "Naive Bayesian automatic classification of railway service complaint text based on eigenvalue extraction." 26(3): 778-785.
  14. Miner, G. (2012). Practical text mining and statistical analysis for non-structured text data applications, Academic Press.
  15. Muraina, I. (2022). Ideal dataset splitting ratios in machine learning algorithms: general concerns for data scientists and data analysts. 7th International Mardin Artuklu Scientific Research Conference.
  16. Naseem, U., et al. (2021). "A comprehensive survey on word representation models: From classical to state-of-the-art word representation language models." 20(5): 1-35.
  17. Sen, P. C., et al. (2020). Supervised classification algorithms in machine learning: A survey and review. Emerging technology in modelling and graphics, Springer: 99-111.
  18. Tharwat, A. J. A. C. and Informatics (2020). "Classification assessment methods."
  19. Tufail, A. B., et al. (2020). "Binary classification of Alzheimer’s disease using sMRI imaging modality and deep learning." 33: 1073-1090.
  20. Xu, S., et al. (2017). Bayesian multinomial Naïve Bayes classifier to text classification. Advanced Multimedia and Ubiquitous Engineering: MUE/FutureTech 2017 11, Springer.
Index Terms

Computer Science
Information Sciences

Keywords

Text classification Data splitting Supervised machine learning Multi-Classification Random sampling and Stratified sampling Complaint handling.