CFP last date
20 May 2024
Reseach Article

An Efficient Feature Selection Method for Arabic Text Classification

by Bilal Hawashin, Ayman Mansour, Shadi Aljawarneh
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 83 - Number 17
Year of Publication: 2013
Authors: Bilal Hawashin, Ayman Mansour, Shadi Aljawarneh
10.5120/14666-2588

Bilal Hawashin, Ayman Mansour, Shadi Aljawarneh . An Efficient Feature Selection Method for Arabic Text Classification. International Journal of Computer Applications. 83, 17 ( December 2013), 1-6. DOI=10.5120/14666-2588

@article{ 10.5120/14666-2588,
author = { Bilal Hawashin, Ayman Mansour, Shadi Aljawarneh },
title = { An Efficient Feature Selection Method for Arabic Text Classification },
journal = { International Journal of Computer Applications },
issue_date = { December 2013 },
volume = { 83 },
number = { 17 },
month = { December },
year = { 2013 },
issn = { 0975-8887 },
pages = { 1-6 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume83/number17/14666-2588/ },
doi = { 10.5120/14666-2588 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:59:37.391251+05:30
%A Bilal Hawashin
%A Ayman Mansour
%A Shadi Aljawarneh
%T An Efficient Feature Selection Method for Arabic Text Classification
%J International Journal of Computer Applications
%@ 0975-8887
%V 83
%N 17
%P 1-6
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper proposes an efficient, Chi-Square-based, feature selection method for Arabic text classification. In Data Mining, feature selection is a preprocessing step that can improve the classification performance. Although few works have studied the effect of feature selection methods on Arabic text classification, limited number of methods was compared. Furthermore, different datasets were used by different works. This paper improves the previous works in three aspects. First, it proposes a new efficient feature selection method for enhancing Arabic text classification. Second, it compares extended number of existing feature selection methods. Third, it adopts two publicly available datasets to encourage future works to adopt them in order to guarantee fair comparisons among the various works. Our experiments show that our proposed method outperformed the existing methods in term of accuracy.

References
  1. Bowman, M. , Debray, S. K. , and Peterson, L. L. 1993. Reasoning about naming systems. . Akhbar Alkhalij and Alwatan-datasets, https://sites. google. com/site/mouradabbas9/corpora.
  2. Al-Harbi, S. , Al-Muhareb, A. , Al-Thubaity, M. , Khorsheed, S. , and Al-Rajeh, A. 2008. Automatic Arabic Text Classification. JADT: 9es, Journées internationales d'Analyse statistique des Données Textuelles, 77-83.
  3. Al-Saleem, S. 2010. Associative Classification to Categorize Arabic Data Sets. Internationsl Journal of ACM Jordan (Jan. 2010), 118-127.
  4. El-Halees, A. 2007. Arabic Text Classification Using Maximum Entropy. The Islamic University Journal (Jan. 2007), 157-167.
  5. El-Kourdi, M. , Bensaid, A. , and Rachidi, T. 2004. Automatic Arabic text categorization based on the Naive-Bayes Algorithm. Workshop on computational approaches to Arabic script-based languages.
  6. Hall, M. , Frank, E. , Holmes, G. , Pfahringer, B. , Reutemann, P. , Witten, I. H. 2009. The Weka Data Mining Software: An Update, SIGKDD Explorations (Jun. 2009), 10-18.
  7. Hall, M. A. 1999. Correlation-based Feature Subset Selection for Machine Learning. Thesis. Department of Computer Science, The University of Waikato.
  8. Harrag, F. , El-Qawasmah, E. , and Al-Salman, A. S. 2010. Comparing Dimension Reduction Techniques for Arabic Text Classification using BPNN algorithm. First International Conference on Integrated Intelligent Computing, 6-11.
  9. Joachims, T. 1998. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In Proceedings of 10th European Conference on Machine Learning, 137-142.
  10. Khreisat, L. 2009. A Machine Learning Approach For Arabic Text Classification Using N-gram Frequency Statistics, Journal of Informatics (Sep. 2011) , 72-77.
  11. Kohavi, R. , John, G. H. 1997. Wrappers for Feature Subset Selection. Artificial Intelligence (Dec. 1997), 273-324.
  12. Lam, S. L. Y. and Lee, D. L. 1999. Feature Reduction for Neural Network Based Text Categorization. In Proceedings of the 6th International Conference on Database Systems for Advanced Applications, 195-202.
  13. Liu, H. and Setiono, R 1996. A Probabilistic Approach to Feature Selection - A Filter Solution. In Proceedings of the 13th confernce on Machine Learning, 319-327.
  14. Mesleh, A. 2007. Chi Square Feature Extraction Based SVMs Arabic Language Text Categorization System. Journal of Computer Science, (Jun. 2007), 430-435.
  15. Quinlan, J. R. 1986. Induction of Decision Trees. Machine Learning (Jan. 1986), 81-106.
  16. Yang, Y. and Pedersen, J. O. 1997. A Comparative Study on Feature Selection in Text Categorization. In Proceedings of the 14th International Conference on Machine Learning, 412-420.
Index Terms

Computer Science
Information Sciences

Keywords

Data Mining Arabic Text Retrieval Feature Selection CHI Square.