CFP last date
20 May 2024
Reseach Article

An Efficient Distributed Feature Subset Selection Technique on High Dimensional Small Sized Data

Published on March 2017 by Apurva Y. Chaudhari, Satish. S. Banait
Emerging Trends in Computing
Foundation of Computer Science USA
ETC2016 - Number 4
March 2017
Authors: Apurva Y. Chaudhari, Satish. S. Banait
8fc9b633-b883-4505-b372-4399339b4012

Apurva Y. Chaudhari, Satish. S. Banait . An Efficient Distributed Feature Subset Selection Technique on High Dimensional Small Sized Data. Emerging Trends in Computing. ETC2016, 4 (March 2017), 11-16.

@article{
author = { Apurva Y. Chaudhari, Satish. S. Banait },
title = { An Efficient Distributed Feature Subset Selection Technique on High Dimensional Small Sized Data },
journal = { Emerging Trends in Computing },
issue_date = { March 2017 },
volume = { ETC2016 },
number = { 4 },
month = { March },
year = { 2017 },
issn = 0975-8887,
pages = { 11-16 },
numpages = 6,
url = { /proceedings/etc2016/number4/27322-6275/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 Emerging Trends in Computing
%A Apurva Y. Chaudhari
%A Satish. S. Banait
%T An Efficient Distributed Feature Subset Selection Technique on High Dimensional Small Sized Data
%J Emerging Trends in Computing
%@ 0975-8887
%V ETC2016
%N 4
%P 11-16
%D 2017
%I International Journal of Computer Applications
Abstract

Feature subset selection is a crucial phase in modeling accurate classifiers in data mining and machine learning, especially with High Dimensional Small Sized (HDSS) data. LDA can also be used for feature selection as an efficient measure for evaluation of the feature subset. While LDA is applied to feature selection on HDSS data and class imbalance, it meets some difficulties, such as singular scatter matrix, overwhelming, overfitting, and computational complexity. For this purpose, a new LDA based feature selection technique based is proposed which focuses more on minority class with a novel regularization technique. Main objective is to enhance the performance of feature subset selection process using LDA in distributed environment. Sample ratio between both classes has been determined.

References
  1. Feng Yang, K. Z. Mao, Gary KeeKhoon Lee, And Wenyin Tang, Emphasizing Minority Class In LDA For Feature Subset Selection On High-Dimensional Small- Sized Problems, IEEE Transactions On Knowledge And Data Engineering, Vol. 27, No. 1, January 2015.
  2. JIANG Zhu, Zhao Fei, Feature Selection for High-Dimensional and Small Sized Data Based on Multi Criterion Fusion, Journal of Convergence Information Technology(JCIT) Volume 7, Number 19, Oct 2012.
  3. M. Robnik Sikonja and I. Kononenko, Theoretical and empirical analysis of relieff and rrelieff, Machine Learning, vol. 53, no. 1 2, pp. 23 69, 2003.
  4. Xinjian Guo, Yilong Yin1, Cailing Dong, Gongping Yang, Guangtong Zhou, On the Class Imbalance Problem.
  5. Y. Tang, Y. Q. Tang, and Z. Huang, Development of two stage SVM RFE gene selection strategy for microarray expression data analysis, IEEE/ACM Trans. Comput. Biol. Bioinformat. , vol. 4, no. 3, pp. 365381, Jul. Sep. 2007.
  6. H. Peng, F. Long, and C. Ding, Feature selection based on mutual information: Criteria of max-dependency, max relevance, and min redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 12261238, Aug2005.
  7. K. Javed, H. A. Babri, and M. Saeed, Feature selection based on class-dependent densities for high dimensional binary data, IEEE Transactions on Knowledge and Data Engineering, vol. 99, no. PrePrints, 2010.
  8. X. Zhou and K. Z. Mao, The ties problem resulting from counting-based error estimators and its impact on gene section algorithms, Bioinformatics, vol. 22, no. 20, pp. 25072515, 2006.
  9. F. Yang and K. Z. Mao, Robust feature selection for microarray data based on multi-criterion fusion, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 4, pp. 10801092, 2011.
  10. Y. Cheung and H. Zeng, Local kernel regression score for selecting features of high dimensional data, IEEE Transactions on Knowledge and Data Engineering, vol. 21, pp. 17981802, December 2009.
  11. M. Wasikowski and X. wen Chen, Combating the small sample class imbalance problem using feature selection, IEEE Transactions on Knowledge and Data Engineering, vol. 22, pp. 13881400, 2010.
  12. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, " Gene selection for cancer classification using support vector machines," Mach. Learn. , vol. 46, no. 1-3, pp. 389-422, 2002.
  13. Tatyana V. Bandos, Lorenzo Bruzzone, and Gustavo CampsValls, Classification of Hyperspectral Images With Regularized Linear Discriminant Analysis, IEEE Transactions on Geoscience and remote sensing, VOL. 47, NO. 3, MARCH 2009.
Index Terms

Computer Science
Information Sciences

Keywords

Feature Subset Selection Class Emphasis Hdss Classification Regularization