Performance Analysis of k-NN on High Dimensional Datasets

Pradeep Mewada; Jagdish Patil

Call for Paper

June Edition

IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper

Know more

The week's pick

Enhancing Privacy Preservation: Multi-Attribute Protection with P-Sensitive K-Anonymity

Twinkle Patel Kiran Amin

Random Articles

Feasible Study on Pattern Matching Algorithms based on Intrusion Detection Systems

June

2014

Modeling and Economic Analysis of Energy Generation from Biomass Energy

December

2014

M-Pass: Web Authentication Protocol Resistant to Malware and Phishing

April

2014

Performance Analysis on the Effect of Doping Concentration in Copper Indium Gallium Selenide (CIGS) Thin-film Solar Cell

March

2015

Reseach Article

Performance Analysis of k-NN on High Dimensional Datasets

by Pradeep Mewada, Jagdish Patil

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 16 - Number 2

Year of Publication: 2011

Authors: Pradeep Mewada, Jagdish Patil

10.5120/1988-2678

Pradeep Mewada, Jagdish Patil . Performance Analysis of k-NN on High Dimensional Datasets. International Journal of Computer Applications. 16, 2 ( February 2011), 1-5. DOI=10.5120/1988-2678

@article{ 10.5120/1988-2678,

author = { Pradeep Mewada, Jagdish Patil },

title = { Performance Analysis of k-NN on High Dimensional Datasets },

journal = { International Journal of Computer Applications },

issue_date = { February 2011 },

volume = { 16 },

number = { 2 },

month = { February },

year = { 2011 },

issn = { 0975-8887 },

pages = { 1-5 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume16/number2/1988-2678/ },

doi = { 10.5120/1988-2678 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:03:48.127972+05:30

%A Pradeep Mewada

%A Jagdish Patil

%T Performance Analysis of k-NN on High Dimensional Datasets

%J International Journal of Computer Applications

%@ 0975-8887

%V 16

%N 2

%P 1-5

%D 2011

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Research on classifying high dimensional datasets is an open direction in the pattern recognition yet. High dimensional feature spaces cause scalability problems for machine learning algorithms because the complexity of a high dimensional space increases exponentially with the number of features. Recently a number of ensemble techniques using different classifiers have proposed for classifying the high dimensional datasets. The task of these techniques is to detect and exploit relevant patterns in data for classification. The k-nearest neighbor (k-NN) algorithm is amongst the simplest of all machine learning algorithms. This paper discusses various ensemble k-NN techniques on high dimensional datasets. The techniques mainly include: Random Subspace Classifier (RSM), Divide & Conquer Classification and Optimization using GA (DCC-GA), Random Subsample ensemble (RSE), Improving Fusion of dimensionality reduction (IF-DR). All these approaches generates relevant subset of features from original set and the results is obtain from combined decision of ensemble classifiers. This paper presents an effective study of improvements on ensemble k-NN for the classification of high dimensional datasets. The experimental result shows that these approaches improve the classification accuracy of the k-NN classifier.

References

D. R. Wilson and T. R. Martinez “Improved heterogeneous distance functions” Journal of Artificial Intelligence Research, 6(1):1–34, 1997.
Tom M. Mitchell “Machine Learning” Mcgraw-Hill Science/ Engineering/ Math March, 1997.
Stephen D. Bay “Combining nearest neighbor classifiers through multiple feature subsets” Proceeding 17th Intl. Conf. on Machine Learning-1998.
M. L. Raymer et al “Dimensionality Reduction using Genetic algorithms” IEEE Transactions on Evolutionary Computation, 4(2), 164– 171, 2000.
Pradeep Mewada, Shailendra K. Shrivastava “Review of combining multiple k-nearest neighbor classifiers” International Journal of Computational Intelligence Research & Applications(IJCIRA), July-December 2010, pp. 187-191.
Loris Nanni and Alessandra Lumini “Evolved feature weighting for random subspace classifier” IEEE - transactions on neural networks, vol.19, no.2 February 2008.
Sampath Deegalla and Henrik Bostrom, “Improving Fusion of Dimensionality Reduction Methods for Nearest Neighbor Classification”, IEEE International Conference on Machine Learning and Applications, 978-0-7695-3926, 2009.
H. Abdi, “Partial Least Squares regression (PLS-regression)”.Thousand Oaks (CA): Sage, pp. 792–795 2003.
C. Blake, E. Keogh, And C. J .Merz “UCI Repository of Machine Learning Databases” University Of California, Irvine.
A. K. Pujari “Data mining techniques” University Press February 2001.
X. Wu et al. “Top 10 algorithms in data mining” Knowledge information Springer-Verlag London Limited 2007.
Gursel Serpen and Santhosh Pathical “Classification in High-Dimensional Feature Spaces: Random Subsample Ensemble” IEEE -International Conference on Machine Learning and Applications 2009.
Hamid Parvin, Hosein Alizadeh, Mohsen Moshki, Behrouz Minaei-Bidgoli and Naser Mozayani “Divide & Conquer Classification and Optimization by Genetic Algorithm” third International Conference on Convergence and Hybrid Information Technology, IEEE- 978-0-7695-3407-2008.
R. Sivagaminathan and S. Ramakrishnan, “A hybrid approach for feature subset selection using neural networks and ant colony optimization,” Expert Systems with Applications, vol. 33, 2007, pp. 49-60.
Oleg Okun, Helen Priisalu “Ensembles of K-Nearest Neighbors and Dimensionality Reduction”, IEEE -International Joint Conference on Neural Networks (IJCNN), 978-1-4244-1821-2008.

Index Terms

Computer Science

Information Sciences

Keywords

k-Nearest Neighbor Ensemble Classifiers High Dimensional Feature Space