CFP last date
22 April 2024
Reseach Article

Feature Subset Selection Algorithm for High-Dimensional Data by using FAST Clustering Approach

Published on April 2014 by Kumaravel. V, Raja. K
International Conference on Knowledge Collaboration in Engineering
Foundation of Computer Science USA
ICKCE - Number 1
April 2014
Authors: Kumaravel. V, Raja. K
8943a4e7-7ec2-4e38-a95e-894d96409dfb

Kumaravel. V, Raja. K . Feature Subset Selection Algorithm for High-Dimensional Data by using FAST Clustering Approach. International Conference on Knowledge Collaboration in Engineering. ICKCE, 1 (April 2014), 21-25.

@article{
author = { Kumaravel. V, Raja. K },
title = { Feature Subset Selection Algorithm for High-Dimensional Data by using FAST Clustering Approach },
journal = { International Conference on Knowledge Collaboration in Engineering },
issue_date = { April 2014 },
volume = { ICKCE },
number = { 1 },
month = { April },
year = { 2014 },
issn = 0975-8887,
pages = { 21-25 },
numpages = 5,
url = { /proceedings/ickce/number1/16142-1008/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference on Knowledge Collaboration in Engineering
%A Kumaravel. V
%A Raja. K
%T Feature Subset Selection Algorithm for High-Dimensional Data by using FAST Clustering Approach
%J International Conference on Knowledge Collaboration in Engineering
%@ 0975-8887
%V ICKCE
%N 1
%P 21-25
%D 2014
%I International Journal of Computer Applications
Abstract

Feature selection involves the process of identifying the most useful feature's subset which produces compatible results similar to original set of feature. Efficiency and effectiveness are the two measures to evaluate feature selection algorithm. The time to find the cluster concerns to efficiency, while effectiveness is concerned to quality of subset feature. With these criteria, fast clustering algorithm was proposed and experimented in two steps. Features are divided into cluster in first step and followed by selection representative feature related to the target class from each cluster. Fast algorithm has the probability of producing a useful and independent feature subset. Performance of this algorithm is evaluated against several selection algorithms (FCBF, Relief, and CFs) and it outperforms the other algorithm. The result analyzed from 35 real world dataset (image, microarray, text data) proves not only that FAST produces smaller subset but also improves the performance.

References
  1. Liu, H. , Motoda, H. and Yu, L. 2004 "Selective Sampling Approach to Active Feature Selection," Artificial Intelligence, vol. 159, nos. 1/2, pp. 49-74.
  2. Guyon, I. and Elisseeff, A. 2003 "An Introduction to Variable and Feature Selection," J. Machine Learning Research, vol 3, pp. 1157- 1182, 2003.
  3. Mitchell, T. M. 1982 "Generalization as Search," Artificial Intelligence, vol. 18, no. 2, pp. 203-226, 1982.
  4. Dash, M. and Liu, H. 1997 "Feature Selection for Classification," Intelligent Data Analysis, vol. 1, no. 3, pp. 131-156.
  5. Souza, J. 2004 "Feature Selection with a General Hybrid Algorithm, "PhD dissertation, Univ. of Ottawa.
  6. Langley, P. 1994 "Selection of Relevant Features in Machine Learning," Proc. AAAI Fall Symp. Relevance, pp. 1-5.
  7. Ng, A. Y. 1998 "On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples," Proc. 15th Int'l Conf. Machine Learning, pp. 404-412.
  8. Das, S. 2001 "Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection," Proc. 18th Int'l Conf. Machine Learning, pp. 74-81.
  9. Xing, E. , Jordan, M. , and Karp, R. 2001 "Feature Selection for High- Dimensional Genomic Microarray Data," Proc. 18th Int'l Conf. Machin e Learning, pp. 601-608.
  10. Pereira, F. , Tishby, N. , and Lee, L. , 1993 "Distributional Clustering of English Words," Proc. 31st Ann. Meeting on Assoc. for Computational Linguistics, pp. 183-190.
  11. Baker, L. D. and McCallum, A. K. 1998 "Distributional Clustering of Words for Text Classification," Proc. 21st Ann. Int'l ACM SIGIR Conf. Research and Development in information Retrieval, pp. 96-103,.
  12. I. S. Dhillon, and R. Kumar. , "A Divisive Information Theoretic Feature Clustering Algorithm for Text Classification," J. Machine Learning Research, vol. 3, pp. 1265-1287,2003.
  13. Jaromczyk, J. W. . and Toussaint, G. T. , "Relative Neighborhood Graphs and Their Relatives," Proc. IEEE, vol. 80, no. 9, (Sept. 1992), pp. 1502- 1517.
  14. John, G. H. . , Kohavi, R. , and Pfleger, K. , 1994. "Irrelevant Features and the Subset Selection Problem," Proc. 11th Int'l Conf. Machine Learning, pp. 121-129.
  15. Forman, G. , 2003. "An Extensive Empirical Study of Feature Selection Metrics for Text Classification," J. Machine Learning Research, vol. 3, pp. 1289-1305.
  16. Hall, M. A. , 2000. "Correlation-Based Feature Selection for Discrete and Numeric Class Machine Learning," Proc. 17th Int'l Conf, Machine Learning pp. 359-366.
  17. Kononenko, I. , 1994. "Estimating Attributes: Analysis and Extensions of RELIEF," Proc. European Conf. Machine Learning, pp. 171-182.
  18. Battiti, R. , 1994. "Using Mutual Information for Selecting Features in Supervised Neural Net Learning," IEEE Trans. Neural Networks, vol. 5, no. 4 , (July 1994), pp. 537-550.
  19. Hall, M. A. 1999. "Correlation-Based Feature Subset Selection for Machine Learning," PhD dissertation, Univ. of Waikato.
  20. Yu, L. and Liu, H. 2003. "Feature Selection for High- Dimensional Data: A Fast Correlation-Based Filter Solution," Proc. 20th Int'l Conf. Machine Leaning, vol. 20, no. 2, pp. 856-863.
  21. Yu, L. and Liu, H. 2004. "Efficient Feature Selection via Analysis of Relevance and Redundancy," J. Machine Learning Research, vol. 10, no. 5, pp. 1205-1224.
  22. Fleuret, F. , 2004. "Fast Binary Feature Selection with Conditional Mutual Information," J. Machine Learning Research, vol. 5, pp. 1531-1555, 2004.
  23. Kohav, R. and John, G. H. 1997. , "Wrappers for Feature Subset Selection," Artificial Intelligence, vol. 97, nos. 1/2, pp. 273-324.
  24. Press, W. H. ,Flannery, B. P. , Teukolsky, S. A. , and Vetterling, W. T. , 1988. Numerical Recipes in C. Cambridge Univ. Press.
  25. Almuallim, H. and Dietterich, T. G. , 1994. "Learning Boolean Concepts in the Presence of Many Irrelevant Features," Artificial Intelligence, vol. 69, nos. 1/2, pp. 279-305.
  26. Robnik-Sikonja, M. and Kononenko, I. , 2003. "Theoretical and Empirical Analysis of Relief and ReliefF," Machine Learning, vol. 53, pp. 23- 69.
  27. Dash, M. Liu. , H. and Motoda, H. , 2000. "Consistency Based Feature Selection," Proc. Fourth Pacific Asia Conf. Knowledge Discovery and Data Mining, pp. 98-109.
  28. Cohen, W. , 1995. "Fast Effective Rule Induction," Proc. 12th Int'l Conf. Machine Learning (ICML '95), pp. 115- 123,1995.
Index Terms

Computer Science
Information Sciences

Keywords

Feature Selection Filter Method Feature Clustering Graph-based Clustering.