CFP last date
20 May 2024
Reseach Article

Feature Selection for Cancer Classification: An SVM based Approach

by El Sayed Abdel Wahed, Ibrahim Al Emam, Amr Badr
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 46 - Number 8
Year of Publication: 2012
Authors: El Sayed Abdel Wahed, Ibrahim Al Emam, Amr Badr
10.5120/6928-9371

El Sayed Abdel Wahed, Ibrahim Al Emam, Amr Badr . Feature Selection for Cancer Classification: An SVM based Approach. International Journal of Computer Applications. 46, 8 ( May 2012), 20-26. DOI=10.5120/6928-9371

@article{ 10.5120/6928-9371,
author = { El Sayed Abdel Wahed, Ibrahim Al Emam, Amr Badr },
title = { Feature Selection for Cancer Classification: An SVM based Approach },
journal = { International Journal of Computer Applications },
issue_date = { May 2012 },
volume = { 46 },
number = { 8 },
month = { May },
year = { 2012 },
issn = { 0975-8887 },
pages = { 20-26 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume46/number8/6928-9371/ },
doi = { 10.5120/6928-9371 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:39:13.941443+05:30
%A El Sayed Abdel Wahed
%A Ibrahim Al Emam
%A Amr Badr
%T Feature Selection for Cancer Classification: An SVM based Approach
%J International Journal of Computer Applications
%@ 0975-8887
%V 46
%N 8
%P 20-26
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Cancer is an immense problem facing Egypt and a notorious human being killer. The magnitude of the disease remains unknown. In fact, it is a significant health problem in many other developing countries. The burden of such a predicament will eventually diminish by better diagnosis and classification. Classification is a machine learning technique used to predict the correlation between data samples and classes. There are several classification techniques, among which are: Support Vector Machine (SVM), K-Nearest Neighbor (k-NN) and Naive Bayes (NB) Classifier. Feature Selection for the classification of cancer data means discovering feature values and profiles of diseased and healthy samples. It also means using this knowledge to predict the state of new samples. In this paper, we have proposed an approach for feature selection based on using SVM in three different ways. First, using SVM as a classifier to build a model based on the training data. The purpose is to measure the accuracy of the model in predicting the category of the test data compared with other classifiers. Second, using SVM as a learner, where data is clustered via K-Means into 3, 4 and 5 clusters. Different classifiers are then applied to the clustered data such as SVM, K-NN and NB. A number of 2 validation methods are used to help predict the accuracy of each classifier. These methods are: the 10-Fold Cross Validation (CV) and the Leave-One-Out. Third, using SVM for feature weighting, by predicting feature importance relative to a target class. The experimental results show that SVM classifier presents best accuracy as a classifier, a learner, and a feature weighting method compared with other classifiers used in this study.

References
  1. (2009) The NCRPE website. [Online]. Available: http://www. cancerregistry. gov. eg/
  2. Payam Refaeilzadeh , Lei Tang , Huan Liu," On Comparison of Feature Selection Algorithms"
  3. M. Ramaswami and R. Bhaskaran," A Study on Feature Selection Techniques in Educational Data Mining" ,Journal of Computing, Vol. 1, Issue 1, Dec. 2009.
  4. Petr Somol, Jana Novovi?cov´a and Pavel Pudil," NOTES ON THE EVOLUTION OF FEATURE SELECTION METHODOLOGY". KYBERNETIKA , VOL. 4 3, NUMBER 5 , P. 7 1 3 – 7 3 0,2007
  5. Ulf Johansson, Cecilia S¨onstr¨od, Ulf Norinder,Henrik Bostr¨om, and Tuve L¨ofstr¨om, " Using Feature Selection with Bagging and Rule Extraction in Drug Discovery", G. Phillips-Wren et al. (Eds. ), SIST 4, pp. 413–422. ,2010
  6. Isabelle Guyon, Andr´e Elisseeff ", An Introduction to Variable and Feature Selection ", Journal of Machine Learning Research ,pp 1157-1182,2003
  7. K. A. Abdul Nazeer, M. P. Sebastian, "Improving the Accuracy and Efficiency of the k-means Clustering Algorithm" , WCE, Vol. I, 2009
  8. DC Sansom, T Downs and TK. Saha," Evaluation of support vector machine based forecasting tool in electricity price forecasting for Australian national electricity market participants ", Journal of Electrical & Electronics Engineering, Australia, Vol 22, No. 3,2003
  9. Huiqing Liu, Jinyan Li, and Limsoon Wong, "A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns", Genome Informatics 13,P. 51-60 ,2002.
  10. Jing Yi Tou, 1Kenny Kuan Yew Khoo, 1Yong Haur Tay, 2Phooi Yee Lau "Evaluation of Speed and Accuracy for Comparison of Texture Classification on Embedded Platform".
  11. I. Rish, "An empirical study of the naive Bayes classifier".
  12. Huiqing Liu, Jinyan Li ,Limsoon Wong," A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns" ,Genome Informatics, 13: 51-60 , 2002
  13. Nevine M. Labib, Michael N. Malek," Data Mining for Cancer Management in Egypt Case Study: Childhood Acute Lymphoblastic Leukemia," World Academy of Science, Engineering and Technology 8 ,2005
  14. Yin-Wen Chang,Chih-Jen Lin," Feature Ranking Using Linear SVM" . JMLR: Workshop and Conference Proceedings, 3: 53-64,2008
  15. Debahuti Mishra, Barnali Sahu," Feature Selection for Cancer Classification: A Signal-to-noise Ratio Approach", International Journal of Scientific & Engineering Research, Volume 2, Issue 4, April 2011
Index Terms

Computer Science
Information Sciences

Keywords

Feature Selection Cancer Data Classification K-means Support Vector Machine K-nn Naive Bayes.