CFP last date
22 July 2024
Reseach Article

Analysis of Feature Selection Algorithms on Classification: A Survey

by S. Vanaja, K. Ramesh Kumar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 96 - Number 17
Year of Publication: 2014
Authors: S. Vanaja, K. Ramesh Kumar

S. Vanaja, K. Ramesh Kumar . Analysis of Feature Selection Algorithms on Classification: A Survey. International Journal of Computer Applications. 96, 17 ( June 2014), 29-35. DOI=10.5120/16888-6910

@article{ 10.5120/16888-6910,
author = { S. Vanaja, K. Ramesh Kumar },
title = { Analysis of Feature Selection Algorithms on Classification: A Survey },
journal = { International Journal of Computer Applications },
issue_date = { June 2014 },
volume = { 96 },
number = { 17 },
month = { June },
year = { 2014 },
issn = { 0975-8887 },
pages = { 29-35 },
numpages = {9},
url = { },
doi = { 10.5120/16888-6910 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2024-02-06T22:22:01.776586+05:30
%A S. Vanaja
%A K. Ramesh Kumar
%T Analysis of Feature Selection Algorithms on Classification: A Survey
%J International Journal of Computer Applications
%@ 0975-8887
%V 96
%N 17
%P 29-35
%D 2014
%I Foundation of Computer Science (FCS), NY, USA

The aim of this paper is to discuss about various feature selection algorithms applied on different datasets to select the relevant features to classify data into binary and multi class in order to improve the accuracy of the classifier. Recent researches in medical diagnose uses the different kind of classification algorithms to diagnose the disease. For predicting the disease, the classification algorithm produces the result as binary class. When there is a multiclass dataset, the classification algorithm reduces the dataset into a binary class for simplification purpose by using any one of the data reduction methods and the algorithm is applied for prediction. When data reduction on original dataset is carried out, the quality of the data may degrade and the accuracy of an algorithm will get affected. To maintain the effectiveness of the data, the multiclass data must be treated with its original form without maximum reduction, and the algorithm can be applied on the dataset for producing maximum accuracy. Dataset with maximum number of attributes like thousands must incorporate the best feature selection algorithm for selecting the relevant features to reduce the space and time complexity. The performance of Classification algorithm is estimated by how accurately it predicts the individual class on particular dataset. The accuracy constrain mainly depends on the selection of appropriate features from the original dataset. The feature selection algorithms play an important role in classification for better performance. The feature selection is one of the preprocessing techniques in the classification. This research paper deals with different feature selection algorithms and their performance on different dataset.

  1. Ellen pitt, Richi nayak,"The use of various data mining and feature selection methods in the analysis of a population survey dataset", Australlian computer socity inc 2007.
  2. L. Latha, T. deepa,"Feature selection methods and algorithms", International journal on computer science and engineering, Vol. 3 No. 5 May 2011.
  3. C. Daisy, B. Subhulaksmi, S. Baskar, N. Ramraj," Efficient Dimensionality Reduction Approaches for Feature selection",IEEE computer society,2007.
  4. Grzegorz Ilczuk and Alicja Wakulicz-Deja," Selection of Important Attributes for Medical Diagnosis Systems", Springer-Verlag Berlin Heidelberg 2007.
  5. Yue Huang, Paul McCullagh, Norman Black, Roy Harper, "Feature selection and classi?cation model construction on type 2 diabetic patients' data" Arti?cial Intelligence in Medicine, Elsevier,2007.
  6. Zhenyu Chen,Jianping Li, Liwei Wei," A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue", Arti?cial Intelligence in Medicine, Elsevier,2007.
  7. Mark Hall," A decision tree-based attribute weighting ?lter for naive Bayes",Science Direct,Elsevier,2007.
  8. Piyushkuma, A. Mundra, Jagath C. Rajapakse, "Gene and sample selection for cancer classi?cation with support vectors based statistic",Neurocomputing, Elsevier,2010.
  9. Tzu-Tsung Wong, Liang-Hao Chang," Individual attribute prior setting methods for naive Bayesian classi?ers", Pattern Recognition, Elsevier, 2010.
  10. Tzu-Tsung Wong, "A hybrid discretization method for naive Bayesian classi?ers", Pattern Recognition, Elsevier, 2011.
  11. Chaitrali S. Dangare, Sulabha S. Apte," Improved Study of Heart Disease Prediction System using Data Mining Classification Techniques", International Journal of Computer Applications, Volume 47– No. 10,2012.
  12. A. Sudha, P. Gayathri, N. Jaisankar," Effective Analysis and Predictive Model of Stroke Disease using Classification Methods", International Journal of Computer Applications, Volume 43– No. 14, 2012.
  13. K. Rajesh,Shela Anand," Analysis of SEER Dataset for Breast Cancer Diagnosis using C4. 5 Classification Algorithm", International Journal of Advanced Research in Computer and Communication Engineering,2012.
  14. Hui-Ling Chen, Chang-Cheng Huang, Xin-Gang Yu, Xin Xu,Xin Sun, Gang Wang, Su-Jing Wang," An ef?cient diagnosis system for detection of Parkinson's disease using fuzzy k-nearest neighbor approach", Expert Systems with Applications, Elsevier, 2012.
  15. Liangxiao Jiang, Zhihua Cai, Harry Zhang, Dianhong Wang," Not so greedy: Randomly Selected Naive Bayes", Expert Systems with Applications, Elsevier, 2012.
  16. Senthilkumar Balasubramanial and Umamaheswari,"Novel preprocessing in the computer aided deduction of breast cancer", Journal of Computer Science 2012, 8 (12), 1957-1960.
  17. Santanu Ghorai, Anirban Mukherjee, Sanghamitra Sengupta, and Pranab K. Dutta, "Cancer classification from gene expression data by NPPC ensemble", IEEE/ACM transaction on computational biology and bio informatics, Vol. 8, No. 3, 2011.
  18. H. Liu,H. Motoda,lie yu,"Aelective sampling approach to active feature selection", Science Direct,Elsevier,2004.
  19. J. Liang, S. Yang, A. Winstanley,"Invariant optimal feature selection a distance discriminant and feature ranking based solution", Science Direct, Elsevier, 2008.
  20. C. Daisy,S. Basskar, N. Ramraj, J. Saravanand Koori, p. Jeevanandam, "A novel information theoretic interact algorithm for feature selection using three machine learning algorithms", Expert system with applications,Elsevier,2010.
  21. Sabastian Malonado, Richard Weber, Jayanta Basak,"Simultaneous feature selection and classification using kernel-penalized support vector machine", Information Science, Elsevier, 2011.
  22. Sushmita paul,Praipta Maji, "Rough set based gene selection algorithm for microarray sample classification", International conference on methods and models in computer science",IEEE,2010.
  23. Hui-Ling Chen,Bo Yang,Jie Liu,Da-You Liu, "A support vector machine classi?er with rough set-based feature selection for breast cancer diagnosis", Expert systems with Apllications,Elsevier,2011.
  24. Jayachidra and Punithavalli,"Distinguishability based weighted feature selection using column wise K- neighborhood for the classification of gene microarray dataset", American journal of applied science, 2014.
  25. Pablo Bermejo,José A. Gámez, José M. Puerta, "Speeding up wrapper feature subset selection with naïve Bayes classifier", Knowledge-based systems, Elsevier,2014.
  26. Sean N. Ghazavi, Thunshun W. Liao," Medical data mining by fuzzy modeling with selected features, Arti?cial Intelligence in Medicine, Elsevier, 2008.
  27. Der-Chiang Li,Chiao-Wen Liu, Susan C. Hu," A fuzzy-based data transformation for feature extraction to increase classi?cation performance with small medical data sets", Artificial Intelligence in Medicine",Elsevier,2011.
  28. Pasi Luukka," Feature selection using fuzzy entropy measures with similarity classi?er", Expert system with applications, Elsevier, 2011.
  29. Cesar Iyakaremye, Pasi Luukka, David Koloseni" Feature selection using Yu's similarity measure and fuzzy entropy measures", IEEE, 2012.
  30. Minseok Seo, Sejong Oh,"A novel divide and merge classification for high dimensional datasets", Computational biology and chemistry", Elsevier, 2013.
Index Terms

Computer Science
Information Sciences


Classification Binary class Multiclass feature selection algorithm Medical dataset high dimensional dataset.