An Accurate Grid -based PAM Clustering Method for Large Dataset

Faisal Bin Al Abid; M.a. Mottalib

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2026

Submit your paper

Know more

The week's pick

Structured and Compact: A Novel Encoding and Enhancement Paradigm for ML-based SAT Solving

Ziqi Zhang Lan Zhang

Random Articles

Identifying Overloaded Servers and Managing Dynamic Placement of Virtual machines in Cloud

April

2016

A Survey on various Machine Learning Approaches for ECG Analysis

Apr

2017

Sentiment Analysis Approach based N-gram and KNN Classifier

Jul

2018

A Novel Technique for Data Extraction from Hidden Web Databases

February

2011

Reseach Article

An Accurate Grid -based PAM Clustering Method for Large Dataset

by Faisal Bin Al Abid, M.a. Mottalib

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 41 - Number 21

Year of Publication: 2012

Authors: Faisal Bin Al Abid, M.a. Mottalib

10.5120/5821-7808

Faisal Bin Al Abid, M.a. Mottalib . An Accurate Grid -based PAM Clustering Method for Large Dataset. International Journal of Computer Applications. 41, 21 ( March 2012), 1-6. DOI=10.5120/5821-7808

@article{ 10.5120/5821-7808,

author = { Faisal Bin Al Abid, M.a. Mottalib },

title = { An Accurate Grid -based PAM Clustering Method for Large Dataset },

journal = { International Journal of Computer Applications },

issue_date = { March 2012 },

volume = { 41 },

number = { 21 },

month = { March },

year = { 2012 },

issn = { 0975-8887 },

pages = { 1-6 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume41/number21/5821-7808/ },

doi = { 10.5120/5821-7808 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:30:10.112970+05:30

%A Faisal Bin Al Abid

%A M.a. Mottalib

%T An Accurate Grid -based PAM Clustering Method for Large Dataset

%J International Journal of Computer Applications

%@ 0975-8887

%V 41

%N 21

%P 1-6

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Clustering is the procedure to group similar objects together. Several algorithms have been proposed for clustering. Among them, the K-means clustering method has less time complexity. But it is sensitive to extreme values and would cause less accurate clustering of the dataset. However, K-medoids method does not have such limitations. But this method uses user-defined value for K. Therefore, if the number of clusters is not chosen correctly, it will not provide the natural number of clusters and hence the accuracy will be minimized. In this paper, we propose a grid based clustering method that has higher accuracy than the existing K-medoids algorithm. Our proposed Grid Multi-dimensional K-medoids (GMK) algorithm uses the concept of cluster validity index and it is shown from the experimental results that the new proposed method has higher accuracy than the existing K-medoids method. The object space is quantized into a number of cells, and the distance between the intra cluster objects decrease which contributes to the higher accuracy of the proposed method. Therefore, the proposed approach has higher accuracy and provides natural clustering method which scales well for large dataset.

References

Han Jiawei and Kamber Micheline, 2006, "Data Mining Concepts and Techniques", second ed, China Machine Press.
M. Ester,A. Frommelt, H. -P. Kriegel, and J. Sander, 2000,"Spatial data mining: database primitives, algorithms and efficient DBMS support", Data Mining and Knowledge Discovery, Kluwer Academic Publishers.
Cadez I. , Smyth P. and Mannila H. 2001, "Probabilistic modeling of transactional data with applications to profiling, Visualization, and Prediction", In Proc of the7th ACM SIGKDD, San Francisco, pp. 37-46.
Cooley R. , Mobasher B. and Srivastava J, 1999 "Data preparation for mining world wide web browsing", Journal of Knowledge Information Systems, vol 1, pp 5-32
A. Ben-Dor and Z. Yakhini, 1999, "Clustering gene expression patterns" In Proc of the 3rd Annual International Conference on Computational Molecular Biology (RECOMB 99), Lyon, France, pp11-14.
A. Jain, R. Dubes, 1988. "Algorithms for Clustering Data" Prentice-Hall, EnglewoodCliffs, NJ.
E. Koltach, 2001. "Clustering Algorithms for Spatial Databases: A Survey", Department of Computer Science,UniversityofMaryland.
W. Wang, J. Yang, and R. Muntz, 1997 "STING: a statistical information grid approach to spatial data mining", In Proc of the 23rd VLDB Conference, ,Athens, Greece, pp. 186-195.
R. Ng, and J. Han, 1994, "Efficient and effective clustering methods for spatial data mining" In Proceedings of the 20th Conference on VLDB, Santiago, Chile, pp. 144-155.
Su Youli,Yi , Guohua Chen Liu, 2009, "GK-means: An Efficient K-means Clustering Algorithm Based On Grid", School of Information Science and Engineering Lanzhou University, In Proc. Of the International symposium on Computer network and multimedia Technology (CNMT), Wuhan , pp- 1 – 4.
http://en. wikipedia. org/wiki/Flood_fill
Pardeshiand Bharat, Toshniwal Durga,"Improved K-Medoids Clustering Based on Cluster Validity Index and Object Density", In Proc of IEEE 2nd International Advance Computing Conference,2010, Indian Institute of Technology Roorkee, pp. 379-384.
Zadrozny Bianca and Elkan. Charles , 2002. "Transforming classifier scores into accurate multiclass probability estimates". In Proc of the International Conference on Knowledge Discovery and Data Mining (KDD'02).

Index Terms

Computer Science

Information Sciences

Keywords

Medoid Grid Adult Dataset Partitioning Cluster Validity Index Dense Grid Outlier Detection Accuracy