CFP last date
22 April 2024
Reseach Article

Arabic Text Classification Algorithm using TFIDF and Chi Square Measurements

by Aymen Abu-errub
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 93 - Number 6
Year of Publication: 2014
Authors: Aymen Abu-errub
10.5120/16223-5674

Aymen Abu-errub . Arabic Text Classification Algorithm using TFIDF and Chi Square Measurements. International Journal of Computer Applications. 93, 6 ( May 2014), 40-45. DOI=10.5120/16223-5674

@article{ 10.5120/16223-5674,
author = { Aymen Abu-errub },
title = { Arabic Text Classification Algorithm using TFIDF and Chi Square Measurements },
journal = { International Journal of Computer Applications },
issue_date = { May 2014 },
volume = { 93 },
number = { 6 },
month = { May },
year = { 2014 },
issn = { 0975-8887 },
pages = { 40-45 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume93/number6/16223-5674/ },
doi = { 10.5120/16223-5674 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:15:40.171416+05:30
%A Aymen Abu-errub
%T Arabic Text Classification Algorithm using TFIDF and Chi Square Measurements
%J International Journal of Computer Applications
%@ 0975-8887
%V 93
%N 6
%P 40-45
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Text categorization is the process of classifying documents into a predefined set of categories based on its contents of keywords. Text classification is an extended type of text categorization where the text is further categorized into sub-categories. Many algorithms have been proposed and implemented to solve the problem of English text categorization and classification. However, few studies have been carried out for categorizing and classifying Arabic text. Compared to English, the Arabic text classification is considered as a very challenging due to the Arabic language complex linguistic structure and its highly derivational nature where morphology plays a very important role. This paper proposes a new method for Arabic text classification in which a document is compared with pre-defined documents categories based on its contents using the TF. IDF method (Term Frequency times Inverse Document Frequency) measure, then the document is classified into the appropriate sub-category using Chi Square measure. .

References
  1. A. Alatabbi, and C. S. Iliopoulos, "Morphological analysis and generation for Arabic language. " pp. 1-9.
  2. A. Farghaly, and K. Shaalan, "Arabic Natural Language Processing: Challenges and Solutions," ACM Transactions on Asian Language Information Processing, vol. 8, no. 4, pp. 1-22, 2009.
  3. R. Guzmán-Cabrera, M. Montes-y-Gómez, P. Rosso et al. , "Using the Web as corpus for self-training text categorization," Information Retrieval, vol. 12, no. 3, pp. 400-415, 2009.
  4. A. H. Wahbeh, and M. Al-Kabi, "Comparative Assessment of the Performance of Three WEKA Text Classifiers Applied to Arabic Text," Abhath Al-Yarmouk: Basic Sci. & Eng. , vol. 21, no. 1, pp. 15-28, 2012.
  5. R. L. Liu, "Context recognition for hierarchical text classification," Journal of the American society for information science and technology, vol. 60, no. 4, pp. 803-813, 2009.
  6. R. Al-Shalabi, G. Kanaan, and M. Gharaibeh, "Arabic text categorization using kNN algorithm. " pp. 1-9.
  7. B. Sharef, N. Omar, and Z. Sharef, "An Automated Arabic Text Categorization Based on the Frequency Ratio Accumulation," International Arab Journal of Information Technology (IAJIT), vol. 11, no. 2, pp. 213-221, 2014.
  8. A. Goweder, M. Elboashi, and A. Elbekai, "Centroid-Based Arabic Classifier. " pp. 1-8.
  9. A. A. Molijy, I. Hmeidi, and I. Alsmadi, "Indexing of Arabic documents automatically based on lexical analysis," International Journal on Natural Language Computing, vol. 1, no. 1, pp. 1-8, 2012.
  10. M. Al-diabat, "Arabic Text Categorization Using Classification Rule Mining," Applied Mathematical Sciences, vol. 6, no. 81, pp. 4033-4046, 2012.
  11. S. Alsaleem, "Automated Arabic Text Categorization Using SVM and NB," Int. Arab J. e-Technol. , vol. 2, no. 2, pp. 124-128, 2011.
  12. T. Zaki, D. Mammass, A. Ennaji et al. , "Arabic Documents Classification by a Radial Basis Hybridization. " pp. 37-44.
  13. M. M. Syiam, Z. T. Fayed, and M. Habib, "An intelligent system for Arabic text categorization," International Journal of Intelligent Computing and Information Sciences, vol. 6, no. 1, pp. 1-19, 2006.
  14. S. Al-Harbi, A. Almuhareb, A. Al-Thubaity et al. , "Automatic Arabic text classification. " pp. 77-83.
  15. A. M. d. A. Mesleh, "Chi Square Feature Extraction Based Svms Arabic Language Text Categorization System," Journal of Computer Science, vol. 3, no. 6, 2007.
  16. F. Harrag, E. El-Qawasmeh, and P. Pichappan, "Improving arabic text categorization using decision trees. " pp. 110-115.
  17. H. M. Noaman, S. Elmougy, A. Ghoneim et al. , "Naive Bayes Classifier based Arabic document categorization. " pp. 1-5.
Index Terms

Computer Science
Information Sciences

Keywords

Text Categorization Text Classification Term Frequency Inverse Document Frequency Chi Square.