An Efficient Distributed Feature Subset Selection Technique on High Dimensional Small Sized Data

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 22 June 2026

Submit your paper

Know more

The week's pick

Multi-Band RLS Estimation with Rank Two Updates: Application to Short-Term Temperature Forecast

Alexander Stotsky

Random Articles

Consumer Preferences for Mobile Carriers in Tanzania: A Case of Group, Family, Age and Gender

April

2015

Artificial Neural Network for Human Behavior Prediction through Handwriting Analysis

May

2010

Reverse Engineering Java Code to Class Diagram: An Experience Report

September

2011

A Model for African Fabrics Analysis and Recognition

November

2013

Reseach Article

An Efficient Distributed Feature Subset Selection Technique on High Dimensional Small Sized Data

Published on March 2017 by Apurva Y. Chaudhari, Satish. S. Banait

Emerging Trends in Computing

Foundation of Computer Science USA

ETC2016 - Number 4

March 2017

Authors: Apurva Y. Chaudhari, Satish. S. Banait

Apurva Y. Chaudhari, Satish. S. Banait . An Efficient Distributed Feature Subset Selection Technique on High Dimensional Small Sized Data. Emerging Trends in Computing. ETC2016, 4 (March 2017), 11-16.

@article{

author = { Apurva Y. Chaudhari, Satish. S. Banait },

title = { An Efficient Distributed Feature Subset Selection Technique on High Dimensional Small Sized Data },

journal = { Emerging Trends in Computing },

issue_date = { March 2017 },

volume = { ETC2016 },

number = { 4 },

month = { March },

year = { 2017 },

issn = 0975-8887,

pages = { 11-16 },

numpages = 6,

url = { /proceedings/etc2016/number4/27322-6275/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 Emerging Trends in Computing

%A Apurva Y. Chaudhari

%A Satish. S. Banait

%T An Efficient Distributed Feature Subset Selection Technique on High Dimensional Small Sized Data

%J Emerging Trends in Computing

%@ 0975-8887

%V ETC2016

%N 4

%P 11-16

%D 2017

%I International Journal of Computer Applications

Abstract

Feature subset selection is a crucial phase in modeling accurate classifiers in data mining and machine learning, especially with High Dimensional Small Sized (HDSS) data. LDA can also be used for feature selection as an efficient measure for evaluation of the feature subset. While LDA is applied to feature selection on HDSS data and class imbalance, it meets some difficulties, such as singular scatter matrix, overwhelming, overfitting, and computational complexity. For this purpose, a new LDA based feature selection technique based is proposed which focuses more on minority class with a novel regularization technique. Main objective is to enhance the performance of feature subset selection process using LDA in distributed environment. Sample ratio between both classes has been determined.

References

Feng Yang, K. Z. Mao, Gary KeeKhoon Lee, And Wenyin Tang, Emphasizing Minority Class In LDA For Feature Subset Selection On High-Dimensional Small- Sized Problems, IEEE Transactions On Knowledge And Data Engineering, Vol. 27, No. 1, January 2015.
JIANG Zhu, Zhao Fei, Feature Selection for High-Dimensional and Small Sized Data Based on Multi Criterion Fusion, Journal of Convergence Information Technology(JCIT) Volume 7, Number 19, Oct 2012.
M. Robnik Sikonja and I. Kononenko, Theoretical and empirical analysis of relieff and rrelieff, Machine Learning, vol. 53, no. 1 2, pp. 23 69, 2003.
Xinjian Guo, Yilong Yin1, Cailing Dong, Gongping Yang, Guangtong Zhou, On the Class Imbalance Problem.
Y. Tang, Y. Q. Tang, and Z. Huang, Development of two stage SVM RFE gene selection strategy for microarray expression data analysis, IEEE/ACM Trans. Comput. Biol. Bioinformat. , vol. 4, no. 3, pp. 365381, Jul. Sep. 2007.
H. Peng, F. Long, and C. Ding, Feature selection based on mutual information: Criteria of max-dependency, max relevance, and min redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 12261238, Aug2005.
K. Javed, H. A. Babri, and M. Saeed, Feature selection based on class-dependent densities for high dimensional binary data, IEEE Transactions on Knowledge and Data Engineering, vol. 99, no. PrePrints, 2010.
X. Zhou and K. Z. Mao, The ties problem resulting from counting-based error estimators and its impact on gene section algorithms, Bioinformatics, vol. 22, no. 20, pp. 25072515, 2006.
F. Yang and K. Z. Mao, Robust feature selection for microarray data based on multi-criterion fusion, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 4, pp. 10801092, 2011.
Y. Cheung and H. Zeng, Local kernel regression score for selecting features of high dimensional data, IEEE Transactions on Knowledge and Data Engineering, vol. 21, pp. 17981802, December 2009.
M. Wasikowski and X. wen Chen, Combating the small sample class imbalance problem using feature selection, IEEE Transactions on Knowledge and Data Engineering, vol. 22, pp. 13881400, 2010.
I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, " Gene selection for cancer classification using support vector machines," Mach. Learn. , vol. 46, no. 1-3, pp. 389-422, 2002.
Tatyana V. Bandos, Lorenzo Bruzzone, and Gustavo CampsValls, Classification of Hyperspectral Images With Regularized Linear Discriminant Analysis, IEEE Transactions on Geoscience and remote sensing, VOL. 47, NO. 3, MARCH 2009.

Index Terms

Computer Science

Information Sciences

Keywords

Feature Subset Selection Class Emphasis Hdss Classification Regularization