CFP last date
20 May 2024
Reseach Article

Survey on Feature Selection for Data Reduction

by R. K. Bania
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 94 - Number 18
Year of Publication: 2014
Authors: R. K. Bania
10.5120/16456-2390

R. K. Bania . Survey on Feature Selection for Data Reduction. International Journal of Computer Applications. 94, 18 ( May 2014), 1-7. DOI=10.5120/16456-2390

@article{ 10.5120/16456-2390,
author = { R. K. Bania },
title = { Survey on Feature Selection for Data Reduction },
journal = { International Journal of Computer Applications },
issue_date = { May 2014 },
volume = { 94 },
number = { 18 },
month = { May },
year = { 2014 },
issn = { 0975-8887 },
pages = { 1-7 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume94/number18/16456-2390/ },
doi = { 10.5120/16456-2390 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:17:59.334981+05:30
%A R. K. Bania
%T Survey on Feature Selection for Data Reduction
%J International Journal of Computer Applications
%@ 0975-8887
%V 94
%N 18
%P 1-7
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The storage capabilities and advanced in data collection has led to an information load and the size of databases increases in dimensions, not only in rows but also in columns. Data reduction (DR) plays a vital role as a data prepossessing techniques in the area of knowledge discovery from the huge collection of data. Feature selection (FS) is one of the well known data reduction techniques, which deals with the reduction of attributes from the original data without affecting the main information content. Based on the training data used for different applications of knowledge discovery, FS technique falls into supervised, unsupervised. In this paper an extensive survey on supervised FS technique describing the different searching approach, methods and application areas with an outline of a comparative study is covered.

References
  1. M. Dash and H. Liu. Feature Selection for Classification. Intelligent Data Analysis,vol. 1,no. 3,pp. 131-156, 1997.
  2. J. Han and M. Kamber. Data Mining Concepts and Techniques 2nd Edition, Morgan Kaufmann Publishers March 2006.
  3. M. R. Sikonja and I. Kononenko. Theoretical and empirical analysis of Relief and ReliefF. Machine Learning, 53:23-69, 2003.
  4. L. Song, A. Smola, A. Gretton, K. Borgwardt, and J. Bedo. Supervised feature selection via dependence estimation. In International Conference on Machine Learning, 2007.
  5. I. L. Kuncheva. Combining pattern classifiers: methods and algorithms, Wiley-interscience Publication, 2004.
  6. P. Pudil, J. Novovicov, J,Kittler. Floating search methods in feature selection. Pattern Recognition. Letter,15(11), pp. 1119-1125, 1994. , 2005.
  7. P. Somol and P. Pudil. Oscillating search algorithms for feature selection. In ICPR 2000, Los Alamitos, CA, USA: IEEE Computer Society,volume 02, pp. 406-409,2000.
  8. J. Weston, A. Elisse, B. Schoelkopf, and M. Tipping. Use of the zero norm with linear models and kernel methods. Journal of Machine Learning Research, pp. 1439-1461, 2003.
  9. J. Bala, J. Huang, H. Vafaie, K. DeJong, and H. Wechsler. Hybrid learning using genetic algorithms and decision trees for pattern classification. In IJCAI (1), pp. 719-724, 1995.
  10. G. H. John, R. Kohavi, and K. Peger. Irrelevant feature and the subset selection problem. In W. W. Cohen and Hirsh H. , editors,Machine Learning: Proceedings of the Eleventh International Conference,New Brunswick, N. J. Rutgers University. pp. 121-129,1994.
  11. Z. Zhao and H. Liu. Semi-supervised feature selection via spectral analysis. In Proceedings of SIAM International Conference on Data Mining(SDM),2007.
  12. J. Doak. An Evaluation of Feature Selection Methods and Their Application to Computer Security. Technical report, University of California at Davis, Dept. of Computer Science, 1992.
  13. H. Almuallim and T. G. Dietterich. Learning Boolean Concepts in the Presence of Many Irrelevant Features, Artificial Intelligence, vol. 69,nos. 1-2,pp. 279-305,1994.
  14. H. Liu and R. Setiono, A Probabilistic Approach to Feature Selection-A Filter Solution, Proc. 13th Int'l Conf. Machine Learning, pp. 319-327, 1996.
  15. I. Guyon. A. Elisseeff. An introduction to variable and feature selection. J. Mach. Learn. Res. , 3, pp. 1157-1182. 2003.
  16. E. Leopold and J. Kindermann. Text Categorization with Support Vector Machines. How to Represent Texts in Input Space? Machine Learning, vol. 46, pp. 423-444, 2002.
  17. D. A. Bell, H. Wang. A formalism for relevance and its application in feature subset selection, Machine Learning 41, pp. 175-195. 2001.
  18. C. Ding and H. Peng. Minimum redundancy feature selection from microarray gene expression data. In Proceedings of the Computational Systems Bioinformatics conference (CSB'03), pp. 523-529, 2003.
  19. C. Lai, M. J. T Reinders, L. J van't Veer, and L. F. A Wessels. A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets. BMC Bioinformatics, 7:235, 2006.
  20. W. Lee, S. J. Stolfo, and K. W. Mok. Adaptive intrusion detection: A data mining approach. AI Review, vol. 14(6), pp. 533-567, 2000.
  21. H. Liu and L. Yu. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17 pp. 491-502, 2005.
  22. Y. Saeys, I. Inza, and P. Larraaga. A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19) pp. 2507-2517, 2007.
  23. Y. Sun, C. F. Babbs, and E. J. Delp. A comparison of feature selection methods for the detection of breast cancers in mammograms: adaptive sequential floating search vs. genetic algorithm. Conf Proc IEEE Eng Med Biol Soc, 6:6532-6535, 2005.
  24. D. L. Swets and J. J. Weng. Efficient content-based image retrieval using automatic feature selection. In IEEE International Symposium On Computer Vision, pp. 85-90, 1995.
  25. L. Yu, C. Ding, and S. Loscalzo. Stable feature selection via dense feature groups. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008.
  26. D. W. Aha, D. Kibler, and M. K. Albert. Instance-based learning algorithms. Machine Learning, 6:37-66, 1991.
  27. R. Kohavi and G. H. John. Wrappers for Feature Subset Selection, Artificial Intelligence, vol. 97, nos. 1-2, pp. 273-324, 1997.
  28. R. Caruana and D. Freitag. Greedy attribute selection. In International Conference on Machine Learning, pp. 28-36, 1994.
  29. T. M. Cover and P. E. Hart. Nearest neighbor pattern classifier. IEEE Transactions on Information Theory, 13:21-27, 1967.
  30. K. Kira and L. Rendell. A practical approach to feature selection. In Proc. 9th International Workshop on Machine Learning, pp. 249-256, 1992.
  31. D. Koller and M. Sahami. Toward optimal feature selection. In International Conference on Machine Learning, pp. 284-292, 1996.
  32. J. R. Quinlan. Induction of decision trees. Journal of Machine Learning, vol-1 pp. 81-106, 1986.
  33. J. R. Quinlan. C4. 5: Programs for Machine Learning. The Morgan Kaufmann Series in Machine Learning, Morgan Kaufmann Publishers, San Mateo, CA. 1993.
  34. H. Liu and R. Setiono. Feature Selection and Classification-A Probabilistic Wrapper Approach, Proc. 9th Intl Conf. Industrial and Eng. Applications of AI and ES, T. Tanaka, S. Ohsuga, and M. Ali, eds. , pp. 419-424, 1996.
  35. D. Sun, D. Zhang. Bagging Constraint Score for feature selection with pairwise constraints. Elsveir Pattern recognition pp. 2106-2118, 2010.
  36. Q. Hu, H. Zhao, Z. Xie, and, D. Yu. Consistency based attribute reduction. PAKDD 2007, LNAI 4426, Yang (Ed. ), vol. 4426, pp. 96-107, 2007.
  37. P. Pudil, P. Novovicov, N. Choakjarernwanit, J. Kittler. Feature selection based on approximation of class densities by finite mixtures of special type. Pattern Recognition, 28, pp. 1389-1398. 1995.
  38. S. Mika, G. Ratsch, and K. -R. Muller. A Mathematical Programming Approach to the Kernel Fisher Algorithm. Advances in Neural Information Processing Systems,Cambridge, MA, USA, MIT Press. pp. 591-597, 2000.
  39. I. Guyon, J. Weston, S. Barnhill, V. Bapnik. Gene selection for cancer classification using support vector machines. Machine Learning, 46(1-3), pp. 389-422. 2002.
  40. M. Sebban, R. Nock. A hybrid filter/wrapper approach of feature selection using information theory. Pattern Recognition, 35, pp. 835-846. 2002.
  41. P. Somol,J. Novovicov,P. Pudil. Flexible-hybrid sequential floating search in statistical feature selection. In Structural, Syntactic, and Statistical Pattern Recognition,Springer-Verlag, volume LNCS 4109 pp. 632-639,2006.
  42. M. A. Esseghir. Effective wrapper-filter hybridization through grasp schemata. In The 4th Workshop on Feature Selection in Data Mining, pp. 45-54, 2010.
  43. S. Ma and J. Huang. Penalized feature selection and classification in bio informatics. Brief Bioinform, 9(5):392-403, Sep 2008.
  44. P. Pudil,P. Novovicov, N. Choakjarernwanit, J. Kittler, Feature selection based on approximation of class densities by finite mixtures of special type. Pattern Recognition, 28, pp. 1389-1398. 1995.
  45. P. M . Narendra and K. Fukunaga, A Branch and Bound Algorithm for Feature Subset Selection, IEEE Trans. Computer, vol. 26, no. 9, pp. 917-922, Sept. 1977.
  46. A. A. Zhigljavsky. Theory of Global Random Search. Kluwer Academic. ISBN 0-7923-1122-1, 1991.
  47. J. Doak, An Evaluation of Feature Selection Methods and Their Application to Computer Security, technical report, University of California at Davis, Dept. Computer Science, 1992.
  48. H. Liu and H. Motoda. Feature Selection for Knowledge Discovery and Data Mining. Boston: Kluwer Academic, 1998.
  49. A. L. Blum and P. Langley. Selection of Relevant Features and Examples in Machine Learning, Artificial Intelligence, vol. 97, pp. 245-271,1997.
  50. R. Jensen and Q. Shen. Semantics-Preserving Dimensionality Reduction: Rough and Fuzzy-Rough Based Approaches. IEEE Transactions on Knowledge and Data Engineering, 16(12): 1457-1471, 2004.
  51. R. Jensen and Q. Shen. Tolerance-based and Fuzzy-Rough Feature Selection, Proceedings of the 16th International Conference on Fuzzy Systems pp. 877-882, 2007.
  52. H. Yuan,S. S. Tseng,W. Gangshan and Z. Fuyan. A Two-phase Feature Selection Method using both Filter and Wrapper, IEEE transaction pp. 132-136, 1999.
  53. Q. Shen, R. Jensen. Rough sets, their extension and applications International Journal of Automation and Computing Vol. No. 04(3), pp. 217-228, July 2007.
  54. S. Das. Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection, Proc. 18th Intl Conf. Machine Learning, pp. 74-81, 2001.
  55. L. Zhuo , J. Zheng , F. Wang, X. Li, B. Ai and J. Qian. A GA based wrapper feature selection method for classification of hyperspectral images using SVM, Remote Sensing and Spatial Information Sciences. Vol. XXXVII. Part B7. pp. 397-402, 2008
  56. R. Jensen and Q. Shen. A Rough Set-Aided System for Sorting WWW Book- marks. In N. Zhong et al. (Eds. ), Web Intelligence: Research and Development, pp. 95-105. 2001.
  57. M. A. Hall. Correlation-Based Feature Selection for Discrete and Numeric Class Machine Learning, Proc. 17th Int'l Conf. Machine Learning, pp. 359-366, 2000.
  58. L. Yu and H. Liu. Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution, Proc. 20th International Conference. Machine Learning, pp. 856-863, 2003.
  59. S. Chawla. Feature selection, association rules network and theory building. In The 4th Workshop on Feature Selection in Data Mining, 2010.
  60. D. Koller and M. Sahami, Toward Optimal Feature Selection, Proc. 13th Intl Conf. Machine Learning, pp. 284-292, 1996.
  61. E. P. Xing, M. Jordan, and R. M. Karp. Feature Selection for High-Dimensional Genomic Microarray Data,Proc. 15th International Conference. Machine Learning, pp. 601-608, 2001.
  62. G. Brown, A. Pocock, M. J. Zhao, M. Lujan, Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection, Journal of Machine Learning Research vol-13 pp. 27-66. 2012.
  63. R. Gilad-Bachrach, A. Navot, and N. Tishby. Margin based feature selection- theory and algorithms. In Proceedings of the International Conference on Machine LearningICML, pp. 43-50. ACM Press, 2004.
  64. Li Yeh Chuang, Chao Hsuan Ke, and Cheng Hong Yang. A Hybrid Both Filter and Wrapper Feature Selection Method for Microarray Classification, Proceedings of the International Multi Conference of Engineers and Computer Scientists 2008.
  65. J. Liang, R. Li, Y. Qian. Distance: A more comprehensible perspective for measures in rough set theory, Knowledge-Based Systems 27, pp. 126-136,2012.
  66. M. Dorigo, Optimization, Learning and Natural Algorithms, PhD thesis, Politecnico di Milano, Italie, 1992.
  67. Ding, L. Chan, Classification of hyperspectral remote sensing images with support vector machines and particle swarm optimization, in: Proceedings of (ICIECS'09), pp. 1–52, 2009.
  68. S. Maldonado, R. Weber, A wrapper method for feature selection using SVM, Information sciences Elsveir, doi:10. 1016/j. ins. 2009. 02. 014, pp. 2208-2217, 2009. .
  69. V. Sindhwani, P. Bhattacharya, and S. Rakshit. Information theoretic feature crediting in multiclass SVM. In Proceedings of the firrst SIAM International Conference on Data Mining, 2001.
  70. A. Al-Ani, A. Alsukker and R. N. Khushaba. Feature subset selection using differential evolution and a wheel based search strategy, Elsevier: Swarm and Evolutionary Computation 9, pp. 15–26, 2013.
Index Terms

Computer Science
Information Sciences

Keywords

Data reduction Feature selection Filter Wrapper Embedded