CFP last date
20 June 2024
Reseach Article

Feature Selection using Clustering Approach for Big Data

Published on December 2014 by Harshali D.gangurde
Innovations and Trends in Computer and Communication Engineering
Foundation of Computer Science USA
ITCCE - Number 4
December 2014
Authors: Harshali D.gangurde
78f8ca75-16e6-4ed9-819b-931766397906

Harshali D.gangurde . Feature Selection using Clustering Approach for Big Data. Innovations and Trends in Computer and Communication Engineering. ITCCE, 4 (December 2014), 1-3.

@article{
author = { Harshali D.gangurde },
title = { Feature Selection using Clustering Approach for Big Data },
journal = { Innovations and Trends in Computer and Communication Engineering },
issue_date = { December 2014 },
volume = { ITCCE },
number = { 4 },
month = { December },
year = { 2014 },
issn = 0975-8887,
pages = { 1-3 },
numpages = 3,
url = { /proceedings/itcce/number4/19058-2024/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 Innovations and Trends in Computer and Communication Engineering
%A Harshali D.gangurde
%T Feature Selection using Clustering Approach for Big Data
%J Innovations and Trends in Computer and Communication Engineering
%@ 0975-8887
%V ITCCE
%N 4
%P 1-3
%D 2014
%I International Journal of Computer Applications
Abstract

Feature selection has been a productive field of research and development in data mining, machine learning and statistical pattern recognition, and is widely applied to many fields such as, image retrieval, genomic analysis and text categorization. Feature selection includes selecting the most useful features from the given data set. The feature selection involves removing irrelevant and redundant features form the data set. The feature selection can be efficient and effective using clustering approach. Based on the criteria of efficiency in terms of time complexity and effectiveness in terms of quality of data, useful features from the big data can be selected. Feature selection reduces the computational complexity of learning and prediction algorithms and saves on the cost of measuring non selected features. The feature selection can be done using the graph clustering approach based on theoretic graph. The most relevant features are selected from the cluster for the relevant target class. The features in every cluster are different and independent of the other.

References
  1. QinbaoSong, Jingjie Ni and Guangtao Wang, "A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data", IEEE transaction on Knowledge and Data Engineering 2013
  2. John G. H. , Kohavi R. and Pfleger K. , "Irrelevant Features and the Subset Selection Problem", Proceedings of the Eleventh International Conference on Machine Learning, pp 121-129, 1994.
  3. Koller D and SahamiM. ,"Toward optimal feature selection", Proceedings of International Conference on Machine Learning, pp 284-292, 1996.
  4. Yu L. and Liu H. ," Redundancy based feature selection for microarray data", Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 737-742, 2004
  5. Press W. H. , Flannery B. P. , Teukolsky S. A. and Vetterling W. T. , "Numerical recipes in C". Cambridge University Press, Cambridge, 1988.
  6. Hall M. A. , "Correlation-Based Feature Subset Selection for Machine Learning," Ph. D. dissertation Waikato, New Zealand: Univ. Waikato, 1999.
  7. Hall M. A. and Smith L. A. , "Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper", Proceedings of the Twelfth international Florida Artificial intelligence Research Society Conference, pp. 235-239, 1999
  8. Yu L. and Liu H. , "Feature selection for high-dimensional data: a fast correlation-based filter solution", Proceedings of 20th International Conference on Machine Leaning, 20(2), pp. 856-863, 2003.
  9. Yu L and Liu H. "Efficient feature selection via analysis of relevance and redundancy", Journal of Machine Learning Research, 10(5), pp. 1205-1224, 2004.
  10. Zhao Z. and Liu H. , "Searching for interacting features", Proceedings of the 20th International Joint Conference on AI, 2007
  11. Zhao Z. and Liu H. , "Searching for Interacting Features in Subset Selection" ,Journal Intelligent Data Analysis, 13(2), pp. 207-228, 2009.
  12. Butterworth R. , Piatetsky-Shapiro G. and Simovici D. A. , "On Feature Selection through Clustering", Proceedings of the Fifth IEEE international Conference on Data Mining, pp 581-584, 2005.
  13. Quinlan J. R. , C4. 5: Programs for Machine Learning. San Mateo, Calif: Morgan Kaufman, 1993
  14. Zhongzhe Xiao, Emmanuel Dellandrea, Weibei Dou, Liming Chen. , "ESFS: A new embedded feature selection method based on SFS", Department of Electronic Engineering, Tsinghua University, Beijing, 100084, P. R. China.
  15. Kononenko I. , Estimating Attributes:. ,"Analysis and Extensions of RELIEF", Proceedings of the 1994 European Conference on Machine Learning, pp 171-182, 1994. ,
  16. Pereira F. , Tishby N. and Lee L. ,"Distributional clustering of EnglishWords", Proceedings of the 31st Annual Meeting on Association forComputationalLinguistics, pp 183-190, 1993.
  17. Dash M. , Liu H. and Motoda H. , "Consistency based feature Selection"Proceedings of the Fourth Pacific Asia Conference on Knowledge Discovery and Data Mining, pp. 98-109,2000.
  18. Fleuret F. , "Fast binary feature selection with conditional mutual Information",Journal of Machine Learning Research, 5, pp 1531-1555, 2004.
Index Terms

Computer Science
Information Sciences

Keywords

Feature Selection Clustering