CFP last date
20 May 2024
Reseach Article

An Accurate Grid -based PAM Clustering Method for Large Dataset

by Faisal Bin Al Abid, M.a. Mottalib
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 41 - Number 21
Year of Publication: 2012
Authors: Faisal Bin Al Abid, M.a. Mottalib
10.5120/5821-7808

Faisal Bin Al Abid, M.a. Mottalib . An Accurate Grid -based PAM Clustering Method for Large Dataset. International Journal of Computer Applications. 41, 21 ( March 2012), 1-6. DOI=10.5120/5821-7808

@article{ 10.5120/5821-7808,
author = { Faisal Bin Al Abid, M.a. Mottalib },
title = { An Accurate Grid -based PAM Clustering Method for Large Dataset },
journal = { International Journal of Computer Applications },
issue_date = { March 2012 },
volume = { 41 },
number = { 21 },
month = { March },
year = { 2012 },
issn = { 0975-8887 },
pages = { 1-6 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume41/number21/5821-7808/ },
doi = { 10.5120/5821-7808 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:30:10.112970+05:30
%A Faisal Bin Al Abid
%A M.a. Mottalib
%T An Accurate Grid -based PAM Clustering Method for Large Dataset
%J International Journal of Computer Applications
%@ 0975-8887
%V 41
%N 21
%P 1-6
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Clustering is the procedure to group similar objects together. Several algorithms have been proposed for clustering. Among them, the K-means clustering method has less time complexity. But it is sensitive to extreme values and would cause less accurate clustering of the dataset. However, K-medoids method does not have such limitations. But this method uses user-defined value for K. Therefore, if the number of clusters is not chosen correctly, it will not provide the natural number of clusters and hence the accuracy will be minimized. In this paper, we propose a grid based clustering method that has higher accuracy than the existing K-medoids algorithm. Our proposed Grid Multi-dimensional K-medoids (GMK) algorithm uses the concept of cluster validity index and it is shown from the experimental results that the new proposed method has higher accuracy than the existing K-medoids method. The object space is quantized into a number of cells, and the distance between the intra cluster objects decrease which contributes to the higher accuracy of the proposed method. Therefore, the proposed approach has higher accuracy and provides natural clustering method which scales well for large dataset.

References
  1. Han Jiawei and Kamber Micheline, 2006, "Data Mining Concepts and Techniques", second ed, China Machine Press.
  2. M. Ester,A. Frommelt, H. -P. Kriegel, and J. Sander, 2000,"Spatial data mining: database primitives, algorithms and efficient DBMS support", Data Mining and Knowledge Discovery, Kluwer Academic Publishers.
  3. Cadez I. , Smyth P. and Mannila H. 2001, "Probabilistic modeling of transactional data with applications to profiling, Visualization, and Prediction", In Proc of the7th ACM SIGKDD, San Francisco, pp. 37-46.
  4. Cooley R. , Mobasher B. and Srivastava J, 1999 "Data preparation for mining world wide web browsing", Journal of Knowledge Information Systems, vol 1, pp 5-32
  5. A. Ben-Dor and Z. Yakhini, 1999, "Clustering gene expression patterns" In Proc of the 3rd Annual International Conference on Computational Molecular Biology (RECOMB 99), Lyon, France, pp11-14.
  6. A. Jain, R. Dubes, 1988. "Algorithms for Clustering Data" Prentice-Hall, EnglewoodCliffs, NJ.
  7. E. Koltach, 2001. "Clustering Algorithms for Spatial Databases: A Survey", Department of Computer Science,UniversityofMaryland.
  8. W. Wang, J. Yang, and R. Muntz, 1997 "STING: a statistical information grid approach to spatial data mining", In Proc of the 23rd VLDB Conference, ,Athens, Greece, pp. 186-195.
  9. R. Ng, and J. Han, 1994, "Efficient and effective clustering methods for spatial data mining" In Proceedings of the 20th Conference on VLDB, Santiago, Chile, pp. 144-155.
  10. Su Youli,Yi , Guohua Chen Liu, 2009, "GK-means: An Efficient K-means Clustering Algorithm Based On Grid", School of Information Science and Engineering Lanzhou University, In Proc. Of the International symposium on Computer network and multimedia Technology (CNMT), Wuhan , pp- 1 – 4.
  11. http://en. wikipedia. org/wiki/Flood_fill
  12. Pardeshiand Bharat, Toshniwal Durga,"Improved K-Medoids Clustering Based on Cluster Validity Index and Object Density", In Proc of IEEE 2nd International Advance Computing Conference,2010, Indian Institute of Technology Roorkee, pp. 379-384.
  13. Zadrozny Bianca and Elkan. Charles , 2002. "Transforming classifier scores into accurate multiclass probability estimates". In Proc of the International Conference on Knowledge Discovery and Data Mining (KDD'02).
Index Terms

Computer Science
Information Sciences

Keywords

Medoid Grid Adult Dataset Partitioning Cluster Validity Index Dense Grid Outlier Detection Accuracy