Sum of Distance based Algorithm for Clustering Web Data

Neeti Arora; Mahesh Motwani

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Sum of Distance based Algorithm for Clustering Web Data

by Neeti Arora, Mahesh Motwani

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 87 - Number 7

Year of Publication: 2014

Authors: Neeti Arora, Mahesh Motwani

10.5120/15221-3732

Neeti Arora, Mahesh Motwani . Sum of Distance based Algorithm for Clustering Web Data. International Journal of Computer Applications. 87, 7 ( February 2014), 26-30. DOI=10.5120/15221-3732

@article{ 10.5120/15221-3732,

author = { Neeti Arora, Mahesh Motwani },

title = { Sum of Distance based Algorithm for Clustering Web Data },

journal = { International Journal of Computer Applications },

issue_date = { February 2014 },

volume = { 87 },

number = { 7 },

month = { February },

year = { 2014 },

issn = { 0975-8887 },

pages = { 26-30 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume87/number7/15221-3732/ },

doi = { 10.5120/15221-3732 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:05:19.029560+05:30

%A Neeti Arora

%A Mahesh Motwani

%T Sum of Distance based Algorithm for Clustering Web Data

%J International Journal of Computer Applications

%@ 0975-8887

%V 87

%N 7

%P 26-30

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Clustering is a data mining technique used to make groups of objects that are somehow similar in characteristics. The criterion for checking the similarity is implementation dependent. Clustering analyzes data objects without consulting a known class label or category i. e. it is an unsupervised data mining technique. K-means is a widely used clustering algorithm that chooses random cluster centers (centroid), one for each centroid. The performance of K-means strongly depends on the initial guess of centers (centroid) and the final cluster centroids may not be the optimal ones as the algorithm can converge to local optimal solutions. Therefore it is important for K-means to have good choice of initial centroids. An algorithm for clustering that selects initial centroids using criteria of finding sum of distances of data objects to all other data objects have been formed. The proposed algorithm results in better clustering on synthetic as well as real datasets when compared to the K-means technique.

References

J. Han and M. Kamber. 2002. "Data Mining concepts and Techniques", Morgan Kaufmann Publishers.
J. B. MacQueen, 1967. "Some Methods for classification and Analysis of Multivariate Observations", Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability", Berkeley, University of California Press, pp 281-297.
Margaret Dunham, Data Mining 2006. Introductory and advanced concepts, Pearson education.
S. Llyod. 1982. "Least Squares quantization in PCM". IEEE transactions on information theory, 28(2), pp 129-137.
JuanyingXie, Shuai Xiang, WeixinXie, Xinbo Gao. 2011. "An efficient Global K-means Clustering Algorithm", Journal of Computers, Vol 6, No 2, pp 271-279.
S. S. Khan, A. Ahmed. 2004. "Cluster Center initialization algorithm for k-means algorithm", Pattern Recognition Letters, pp 1293-1302.
Bradley, P. S. , Fayyad, U. M. 1998. , "Refining initial points for K-Means clustering", In Proceedings of 15th International Conf. on Machine Learning. Morgan Kaufmann, San Francisco, CA, pp 91-99.
Likas, A. , Vlassis, N. , Verbeek, J. J. 2003. " The global k-means clustering algorithm",In Pattern Recognition, Vol. 36. pp 451-461.
Fang Yuan, Zeng-HuiMeng, Hong-Xia Zhang, Chun-Ru Dong. 2004. ,"A new algorithm to Get the Initial Centroids",In proceedings of the third International Conference on Machine Learning and Cybernetics, Shanghai, pp26-29.
A. R Barakbah, A. Helen. 2005. "Optimized K-means: an algorithm of initial centroids optimization for K-means. In proceedings of Soft Computing, Intelligent Systems and Information Technology (SIIT), pp2-63-66.
Fahim A. M. , Salem A. M. , Torkey F. A. , Ramadan M. A. 2006. "An efficient enhanced k-means clustering algorithm", Journal of Zhejiang University Science, 7(10), pp1626-1633.
A. R Barakbah, K. Arai. 2007. "Hierarchical K-means: an algorithm for centroids initialization for K-means",Reports of the faculty of Science & Engineering, Saga University, Japan, Vol. 36, No. 1.
A. R Barakbah, Y. Kiyoki. 2009. "A pillar Algorithm for K-means optimization by distance maximization for initial centroid designation", IEEE.
M. Stricker and M. Orengo. 1995. "Similarity of color images", Storage and Retrieval for Image and Video Databases III (SPIE), pp381-392.
http://sites. stat. psu. edu/~jiali/ for image. cd image database
http://sci2s. ugr. es/keel/dataset_smja. php?cod=230 for Corel5k image database
http://wang. ist. psu. edu/docs/related/ for image. orig image database
Jia Li, James Z. Wang, ``Automatic linguistic indexing of pictures by a statistical modeling approach,'' IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, no. 9, pp. 1075-1088, 2003.

Index Terms

Computer Science

Information Sciences

Keywords

Clustering K-means Recall Precision