CFP last date
20 May 2024
Call for Paper
June Edition
IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper
Know more
Reseach Article

The Impact of Transformed Features in Automating the Swahili Document Classification

by Thomas Tesha
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 127 - Number 16
Year of Publication: 2015
Authors: Thomas Tesha
10.5120/ijca2015906707

Thomas Tesha . The Impact of Transformed Features in Automating the Swahili Document Classification. International Journal of Computer Applications. 127, 16 ( October 2015), 37-42. DOI=10.5120/ijca2015906707

@article{ 10.5120/ijca2015906707,
author = { Thomas Tesha },
title = { The Impact of Transformed Features in Automating the Swahili Document Classification },
journal = { International Journal of Computer Applications },
issue_date = { October 2015 },
volume = { 127 },
number = { 16 },
month = { October },
year = { 2015 },
issn = { 0975-8887 },
pages = { 37-42 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume127/number16/22817-2015906707/ },
doi = { 10.5120/ijca2015906707 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:18:14.903709+05:30
%A Thomas Tesha
%T The Impact of Transformed Features in Automating the Swahili Document Classification
%J International Journal of Computer Applications
%@ 0975-8887
%V 127
%N 16
%P 37-42
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper describes experimental results in an attempt to identify the Transformation techniques which can be adopted to improve features for the automation of classification of Swahili documents. This means improving classification rate by enhancing separability and accuracy. The experiment involved Relative Frequency (RF), Power transformation (PT) and Relative Frequency with Power transformation (RFPT). The Term weighting with TFIDF and the absolute features (AF) were also studied. The features’ dimension reduction was done by using the statistical techniques of Principal Component Analysis. In learning algorithm, the Support vector machine for classification and the k-NN were used, and in evaluating the effect of features’ performance with the classifiers the micro averaged f-measure were adopted. The extensive experimental results demonstrated that the RFPT features worked better with the Support Vector Machine classifiers unlike k-NN in improving the classification rate by enhancing document separability and accuracy in Automation of Swahili document classification.

References
  1. Sebastiani F. (2002), “Machine learning in automated text categorization”, ACM Computing Surveys, Vol. 34, pp. 1-47.
  2. T. Tesha, L. S. P. Busagala, “Automatic Swahili documents classification”, unpublished.
  3. Al-Harbi, S. et. Al (2008), “Automatic Arabic Text Classification”, Journées internationales d’Analyse statistique des Données Textuelles
  4. A. Kokawa et al (2011), “Feature Selection and Integration in Automatic Classification of Japanese Texts”, Graduate School of Engineering, Mie University, Tsu-shi, Japan, Sokoine University of Agriculture, Morogoro, Tanzania
  5. Z. Shuigeng and G. Jihong (2002), “Chinese Documents Classification Based on N-Grams”, Proceeding CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing pp 405-414
  6. Y. Zhang, L.Gong, Y. Wang (2005), “An improved TF-IDF approach for text classification”, Journal of Zhejiang University SCIENCE ABC, Vol.6 No.1 Pp.49~55
  7. P.Soucy, G. Mineau (2005), Beyond TFIDF Weighting for Text Categorization inthe VectorSpace Model, In Proceedings of the Proceedings of the 19th International Joint Conference on Artificial Intelligence
  8. L. S. P. Busagala et al (2005), “Machine learning with transformed features in automatic text classification”, Mie University, Kurimachiya-cho, Tsu, Mie, Japan
  9. Z. Guoweiet. Al (2003), “Accuracy improvement of automatic text classification based on feature transformation”, [Online]. Available: http://www.hi.info.mie-u.ac.jp/publication/archive/Guowei_Proc_2003_11.pdf
  10. A. Malero., L. S. P. Busagala (2011), “Transformed features in automatic spam filtering”, Journal of Informatics And Virtual Education, Tanzania
  11. M.Ikonomakis, S. Kotsiantis, and V.Tampakas (2005), “Text Classification Using Machine Learning Techniques”, WSEAS TRANSACTIONS on COMPUTERS, Issue 8, Volume 4, pp. 966-974
  12. T. Joachims (1999), Making large-Scale SVM Learning Practical, Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT-Press, [PDF] [Postscript (gz)] [BibTeX]
  13. O. Arzucan, O.Levent and G. Tunga (2005), “Text Categorization with Class-Based and Corpus-Based Keyword Selection”, Springer-Verlag Berlin Heidelberg
  14. N Cristianini, J Shawe-Taylor (2000), “An introduction to support vector machines and other kernel-based learning methods” Cambridge university press
  15. V. Vapnik (1998), Statistical learning theory. Vol. 1. New York: Wiley
  16. C. Cortes and V. N. Vapnik (1995), “Support vector networks” Machine Learning
Index Terms

Computer Science
Information Sciences

Keywords

Machine learning algorithm Support vector machine Swahili Swahili document classification