CFP last date
20 May 2024
Reseach Article

Improving Clustering Performance on High Dimensional Data using Kernel Hubness

Published on May 2014 by R. Shenbakapriya, M. Kalimuthu, P. Sengottuvelan
International Conference on Simulations in Computing Nexus
Foundation of Computer Science USA
ICSCN - Number 2
May 2014
Authors: R. Shenbakapriya, M. Kalimuthu, P. Sengottuvelan
32e67dc8-12b2-4113-adad-bcfac210fb73

R. Shenbakapriya, M. Kalimuthu, P. Sengottuvelan . Improving Clustering Performance on High Dimensional Data using Kernel Hubness. International Conference on Simulations in Computing Nexus. ICSCN, 2 (May 2014), 27-30.

@article{
author = { R. Shenbakapriya, M. Kalimuthu, P. Sengottuvelan },
title = { Improving Clustering Performance on High Dimensional Data using Kernel Hubness },
journal = { International Conference on Simulations in Computing Nexus },
issue_date = { May 2014 },
volume = { ICSCN },
number = { 2 },
month = { May },
year = { 2014 },
issn = 0975-8887,
pages = { 27-30 },
numpages = 4,
url = { /proceedings/icscn/number2/16156-1023/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference on Simulations in Computing Nexus
%A R. Shenbakapriya
%A M. Kalimuthu
%A P. Sengottuvelan
%T Improving Clustering Performance on High Dimensional Data using Kernel Hubness
%J International Conference on Simulations in Computing Nexus
%@ 0975-8887
%V ICSCN
%N 2
%P 27-30
%D 2014
%I International Journal of Computer Applications
Abstract

Clustering high dimensional data becomes difficult due to the increasing sparsity of such data. One of the inherent properties of high dimensional data is hubness phenomenon, which is used for clustering such data. Hubness is the tendency of high-dimensional data to contain points (hubs) that occurs frequently in k-nearest neighbor lists of other data points. The k-nearest-neighbor lists are used to measure the hubness score of each data point. The simple hub based clustering algorithms detect only hyperspherical clusters in the high dimensional dataset. But the real time high dimensional dataset contains more number of arbitrary shaped clusters. To improve the performance of clustering, a new algorithm is proposed which is based on the combination of kernel mapping and hubness phenomenon. The proposed algorithm detects arbitrary shaped clusters in the dataset and also improves the performance of clustering by minimizing the intra-cluster distance and maximizing the inter-cluster distance which improves the cluster quality.

References
  1. J. Han and M. Kamber (2006), "Data Mining: Concepts and Techniques," 2nd ed. Morgan Kaufmann Publishers.
  2. Milo?s Radovanovi´c, Alexandros Nanopoulos, and Mirjana Ivanovi´c (2010), "Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data," Journal of Machine Learning Research, pp. 2487-2531.
  3. N. Toma?sev and D. Mladeni´c (2012), "Nearest neighbor voting in high dimensional data: Learning from past occurrences," Computer Science and Information Systems, vol. 9, no. 2, pp. 691–712.
  4. N. Tomasev, M. Radovanovic, D. Mladenic, M. Ivanovic (2013), "The Role of Hubness in Clustering High-Dimensional data," IEEE Transactions on Knowledge and Data Engineering, vol:pp, issue:99, ISSN:1041-4347.
  5. N. Tomasev, R. Brehar, D. Mladenic, and S. Nedevschi (2011), "The influence of hubness on nearest-neighbor methods in object recognition," in Proc. 7th IEEE Int. Conf. on Intelligent Computer Communication and Processing (ICCP), pp. 367–374
  6. Grigorios F. Tzortzis and Aristidis C. Likas,(2009), "The Global Kernel K-Means Algorithm for Clustering in Feature Space" IEEE Transactions on Neural Networks, Vol. 20. No. 7,PP:1181-1194.
  7. I. S. Dhillon, Y. Guan, and B. Kulis, "Kernel k-means: spectral clustering and normalized cuts," in Proc. 10th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2004, pp. 551–556.
  8. C. -T. Chang, J. Z. C. Lai, and M. D. Jeng (2010), "Fast agglomerative clustering using information of k-nearest neighbors," Pattern Recognition, vol. 43, no. 12, pp. 3958–3968.
  9. R. Xu, D. Wunsch (2005), "Survey of clustering algorithms," IEEE Transactions on Neural Networks 16 (3) pp. 645–678.
  10. Nanopoulos A. , M. Radovanovi´c, and M. Ivanovi´c (2009), "How does high dimensionality affect collaborative filtering?" in Proc. 3rd ACM Conf. on Recommender Systems (RecSys), pp. 293–296.
  11. A. K. Jain, M. N. Murty, P. J. Flynn (1999), "Data clustering: a review," ACM Computing Surveys 31 (3) pp. 264–323.
  12. E. Plaka and L. E. Kavraki (2007), "Distributed computation of the Knn graph for large high dimensional point sets," Journal of Parallel and DistributeComputing, 67(3): 346-
Index Terms

Computer Science
Information Sciences

Keywords

High Dimensional Data Hubness Phenomenon Kernel Mapping And K-nearest Neighbor.