CFP last date
20 May 2024
Reseach Article

Performance Analysis of k-NN on High Dimensional Datasets

by Pradeep Mewada, Jagdish Patil
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 16 - Number 2
Year of Publication: 2011
Authors: Pradeep Mewada, Jagdish Patil
10.5120/1988-2678

Pradeep Mewada, Jagdish Patil . Performance Analysis of k-NN on High Dimensional Datasets. International Journal of Computer Applications. 16, 2 ( February 2011), 1-5. DOI=10.5120/1988-2678

@article{ 10.5120/1988-2678,
author = { Pradeep Mewada, Jagdish Patil },
title = { Performance Analysis of k-NN on High Dimensional Datasets },
journal = { International Journal of Computer Applications },
issue_date = { February 2011 },
volume = { 16 },
number = { 2 },
month = { February },
year = { 2011 },
issn = { 0975-8887 },
pages = { 1-5 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume16/number2/1988-2678/ },
doi = { 10.5120/1988-2678 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:03:48.127972+05:30
%A Pradeep Mewada
%A Jagdish Patil
%T Performance Analysis of k-NN on High Dimensional Datasets
%J International Journal of Computer Applications
%@ 0975-8887
%V 16
%N 2
%P 1-5
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Research on classifying high dimensional datasets is an open direction in the pattern recognition yet. High dimensional feature spaces cause scalability problems for machine learning algorithms because the complexity of a high dimensional space increases exponentially with the number of features. Recently a number of ensemble techniques using different classifiers have proposed for classifying the high dimensional datasets. The task of these techniques is to detect and exploit relevant patterns in data for classification. The k-nearest neighbor (k-NN) algorithm is amongst the simplest of all machine learning algorithms. This paper discusses various ensemble k-NN techniques on high dimensional datasets. The techniques mainly include: Random Subspace Classifier (RSM), Divide & Conquer Classification and Optimization using GA (DCC-GA), Random Subsample ensemble (RSE), Improving Fusion of dimensionality reduction (IF-DR). All these approaches generates relevant subset of features from original set and the results is obtain from combined decision of ensemble classifiers. This paper presents an effective study of improvements on ensemble k-NN for the classification of high dimensional datasets. The experimental result shows that these approaches improve the classification accuracy of the k-NN classifier.

References
  1. D. R. Wilson and T. R. Martinez “Improved heterogeneous distance functions” Journal of Artificial Intelligence Research, 6(1):1–34, 1997.
  2. Tom M. Mitchell “Machine Learning” Mcgraw-Hill Science/ Engineering/ Math March, 1997.
  3. Stephen D. Bay “Combining nearest neighbor classifiers through multiple feature subsets” Proceeding 17th Intl. Conf. on Machine Learning-1998.
  4. M. L. Raymer et al “Dimensionality Reduction using Genetic algorithms” IEEE Transactions on Evolutionary Computation, 4(2), 164– 171, 2000.
  5. Pradeep Mewada, Shailendra K. Shrivastava “Review of combining multiple k-nearest neighbor classifiers” International Journal of Computational Intelligence Research & Applications(IJCIRA), July-December 2010, pp. 187-191.
  6. Loris Nanni and Alessandra Lumini “Evolved feature weighting for random subspace classifier” IEEE - transactions on neural networks, vol.19, no.2 February 2008.
  7. Sampath Deegalla and Henrik Bostrom, “Improving Fusion of Dimensionality Reduction Methods for Nearest Neighbor Classification”, IEEE International Conference on Machine Learning and Applications, 978-0-7695-3926, 2009.
  8. H. Abdi, “Partial Least Squares regression (PLS-regression)”.Thousand Oaks (CA): Sage, pp. 792–795 2003.
  9. C. Blake, E. Keogh, And C. J .Merz “UCI Repository of Machine Learning Databases” University Of California, Irvine.
  10. A. K. Pujari “Data mining techniques” University Press February 2001.
  11. X. Wu et al. “Top 10 algorithms in data mining” Knowledge information Springer-Verlag London Limited 2007.
  12. Gursel Serpen and Santhosh Pathical “Classification in High-Dimensional Feature Spaces: Random Subsample Ensemble” IEEE -International Conference on Machine Learning and Applications 2009.
  13. Hamid Parvin, Hosein Alizadeh, Mohsen Moshki, Behrouz Minaei-Bidgoli and Naser Mozayani “Divide & Conquer Classification and Optimization by Genetic Algorithm” third International Conference on Convergence and Hybrid Information Technology, IEEE- 978-0-7695-3407-2008.
  14. R. Sivagaminathan and S. Ramakrishnan, “A hybrid approach for feature subset selection using neural networks and ant colony optimization,” Expert Systems with Applications, vol. 33, 2007, pp. 49-60.
  15. Oleg Okun, Helen Priisalu “Ensembles of K-Nearest Neighbors and Dimensionality Reduction”, IEEE -International Joint Conference on Neural Networks (IJCNN), 978-1-4244-1821-2008.
Index Terms

Computer Science
Information Sciences

Keywords

k-Nearest Neighbor Ensemble Classifiers High Dimensional Feature Space