Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

A Novel Approach for Data Clustering using Improved K-means Algorithm

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2016
Authors:
Rishikesh Suryawanshi, Shubha Puthran
10.5120/ijca2016909949

Rishikesh Suryawanshi and Shubha Puthran. A Novel Approach for Data Clustering using Improved K-means Algorithm. International Journal of Computer Applications 142(12):13-18, May 2016. BibTeX

@article{10.5120/ijca2016909949,
	author = {Rishikesh Suryawanshi and Shubha Puthran},
	title = {A Novel Approach for Data Clustering using Improved K-means Algorithm},
	journal = {International Journal of Computer Applications},
	issue_date = {May 2016},
	volume = {142},
	number = {12},
	month = {May},
	year = {2016},
	issn = {0975-8887},
	pages = {13-18},
	numpages = {6},
	url = {http://www.ijcaonline.org/archives/volume142/number12/24947-2016909949},
	doi = {10.5120/ijca2016909949},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

In statistic and data mining, k-means is well known for its efficiency in clustering large data sets. The aim is to group data points into clusters such that similar items are lumped together in the same cluster. The K-means clustering algorithm is most commonly used algorithms for clustering analysis. The existing K-means algorithm is, inefficient while working on large data and improving the algorithm remains a problem. However, there exist some flaws in classical K-means clustering algorithm. According to the method, the algorithm is sensitive to selecting initial Centroid. The quality of the resulting clusters heavily depends on the selection of initial centroids. K-means clustering is a method of cluster analysis which aims to partition ‘n’ observations into k clusters in which each observation belongs to the cluster with the nearest mean. In the proposed project performing data clustering efficiently by decreasing the time of generating cluster. In this project, our aim is to improve the performance using normalization and initial centroid selection techniques in already existing algorithm. The experimental result shows that, the proposed algorithm can overcome shortcomings of the K-means algorithm.

References

  1. Farajian, Mohammad Ali, and Shahriar Mohammadi. "Mining the banking customer behavior using clustering and association rules methods."International Journal of Industrial Engineering 21, no. 4 (2010).
  2. Bhatia, M. P. S., and Deepika Khurana. "Experimental study of Data clustering using k-Means and modified algorithms." International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol 3 (2013).
  3. Jain, Sapna, M. Afshar Aalam, and M. N. Doja. "K-means clustering using weka interface." In Proceedings of the 4th National Conference. 2010.
  4. Kumar, M. Varun, M. Vishnu Chaitanya, and M. Madhavan. "Segmenting the Banking Market Strategy by Clustering." International Journal of Computer Applications 45 (2012).
  5. Namvar, Morteza, Mohammad R. Gholamian, and Sahand KhakAbi. "A two phase clustering method for intelligent customer Segmentation." In Intelligent Systems, Modelling and Simulation (ISMS), 2010 International Conference on, pp. 215-219. IEEE, 2010.
  6. Tian, Jinlan, Lin Zhu, Suqin Zhang, and Lu Liu. "Improvement and parallelism of k-means clustering algorithm." Tsinghua Science & Technology 10, no. 3 (2005): 277-281.
  7. Zhao, Weizhong, Huifang Ma, and Qing He. "Parallel k-means clustering based on mapreduce." In Cloud Computing, pp. 674-679. Springer Berlin Heidelberg, 2009.
  8. Nazeer, KA Abdul, and M. P. Sebastian. "Improving the Accuracy and Efficiency of the k-means Clustering Algorithm." In Proceedings of the World Congress on Engineering, vol. 1, pp. 1-3. 2009.
  9. Fahim, A. M., A. M. Salem, F. A. Torkey, and M. A. Ramadan. "An efficient enhanced k-means clustering algorithm." Journal of Zhejiang University SCIENCE A 7, no. 10 (2006): 1626-1633.
  10. Rasmussen, Edie M., and PETER WILLETT. "Efficiency of hierarchic agglomerative clustering using the ICL distributed array processor." Journal of Documentation 45, no. 1 (1989): 1-24.
  11. Dr.Urmila R. Pol, “Enhancing K-means Clustering Algorithm and Proposed Parallel K-means clustering for Large Data Sets.” International Journal of Advanced Research in Computer Science and Software Engineering, Volume 4, Issue 5, May 2014.
  12. Yugal Kumar, Yugal Kumar, and G. Sahoo G. Sahoo. "A New Initialization Method to Originate Initial Cluster Centers for K-Means Algorithm." International Journal of Advanced Science and Technology 62 (2014): 43-54.
  13. Shafeeq, Ahamed, and K. S. Hareesha. "Dynamic clustering of data with modified k-means algorithm." In Proceedings of the 2012 conference on information and computer networks, pp. 221-225. 2012.
  14. Ben-Dor, Amir, Ron Shamir, and Zohar Yakhini. "Clustering gene expression patterns." Journal of computational biology 6, no. 3-4 (1999): 281-297.
  15. Steinley, Douglas. "Local optima in K-means clustering: what you don't know may hurt you." Psychological methods 8, no. 3 (2003): 294.
  16. Aloise, Daniel, Amit Deshpande, Pierre Hansen, and Preyas Popat. "NP-hardness of Euclidean sum-of-squares clustering." Machine Learning 75, no. 2 (2009): 245-248.
  17. Wang, Haizhou, and Mingzhou Song. "Ckmeans. 1d. dp: optimal k-means clustering in one dimension by dynamic programming." The R Journal 3, no. 2 (2011): 29-33.
  18. Al-Daoud, Moth'D. Belal. "A new algorithm for cluster initialization." In WEC'05: The Second World Enformatika Conference. 2005.
  19. Wang, X. Y., and Jon M. Garibaldi. "A comparison of fuzzy and non-fuzzy clustering techniques in cancer diagnosis." In Proceedings of the 2nd International Conference in Computational Intelligence in Medicine and Healthcare, BIOPATTERN Conference, Costa da Caparica, Lisbon, Portugal, p. 28. 2005.
  20. Liu, Ting, Charles Rosenberg, and Henry A. Rowley. "Clustering billions of images with large scale nearest neighbor search." In Applications of Computer Vision, 2007. WACV'07. IEEE Workshop on, pp. 28-28. IEEE, 2007.
  21. Oyelade, O. J., O. O. Oladipupo, and I. C. Obagbuwa. "Application of k Means Clustering algorithm for prediction of Students Academic Performance." arXiv preprint arXiv: 1002.2425 (2010).
  22. Akkaya, Kemal, Fatih Senel, and Brian McLaughlan. "Clustering of wireless sensor and actor networks based on sensor distribution and connectivity.” Journal of Parallel and Distributed Computing 69, no. 6 (2009): 573-587.
  23. https://sites.google.com/site/dataclusteringalgorithms/clustering-algorithm-applications
  24. Pakhira, Malay K. "A modified k-means algorithm to avoid empty clusters.” International Journal of Recent Trends in Engineering 1, no. 1 (2009).
  25. Singh, Kehar, Dimple Malik, and Naveen Sharma. "Evolving limitations in K-means algorithm in data mining and their removal." International Journal of Computational Engineering & Management 12 (2011): 105-109.
  26. Rishikesh Suryawanshi, Shubha Puthran,"Review of Various Enhancement for Clustering Algorithms in Big Data Mining" International Journal of Advanced Research in Computer Science and Software Engineering(2016)
  27. http://nlp.stanford.edu/IR-book/html/htmledition/k-means-1.html#sec:kmeans
  28. https://archive.ics.uci.edu/ml/datasets.html
  29. http://stats.stackexchange.com/questions/70801/how-to-normalize-data-to-0-1-range
  30. http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array

Keywords

Data Analysis, Clustering, k-means Algorithm, Improved k-means Algorithm