Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

Review of Clustering Techniques for Finding the Similarity in Articles

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2016
Usha Rani, Shashank Sahu

Usha Rani and Shashank Sahu. Review of Clustering Techniques for Finding the Similarity in Articles. International Journal of Computer Applications 155(6):32-35, December 2016. BibTeX

	author = {Usha Rani and Shashank Sahu},
	title = {Review of Clustering Techniques for Finding the Similarity in Articles},
	journal = {International Journal of Computer Applications},
	issue_date = {December 2016},
	volume = {155},
	number = {6},
	month = {Dec},
	year = {2016},
	issn = {0975-8887},
	pages = {32-35},
	numpages = {4},
	url = {},
	doi = {10.5120/ijca2016912329},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}


Clustering is an important technique in data mining. It is a technique in which grouping of item taken place into the clusters in such a way that items of same cluster have more similarity than the items into another cluster, but is very dissimilar to the item in other clusters. The aim of document clustering is to make a set of clusters of given documents in such a way that document of each cluster have more similarity than the documents of other clusters. This paper reviews various techniques of clustering which can be divided mainly into two groups that are hierarchical and partitional clustering.


  1. Pavel Berkhin (2000), Survey of Clustering Data Mining techniques, Accrue Software, Inc.
  2. Sasirekha, K., and P. Baby. "Agglomerative Hierarchical ClusteringAlgorithm-A."InternationalJournal ofScientific andResearch Publications: 83.
  3. Deepa, M. Sathya, and N. Sujatha. "Comparative Studies of Various Clustering Techniques and Its Characteristics." Int. J. Advanced Networking and Applications 5.6 (2014): 2104-2116.
  4. Jiawei Han and Michheline Kamber, Data mining concepts and techniques-a reference book, pg. no.-383-422.
  5. Xu Rui and Donald Vrinshc. "Survey of clustering Algorithms." IEEE Neural Networks on Tronskshns 16.3 (2005): 645-67
  6. Elavarasi, S. Anitha, J. Akilandeswari, and B. Sathiyabhama. "A survey on partitionclustering algorithms." International Journal of Enterprise Computing and Business Systems 1.1 (2011).
  7. Jain, Anoop Kumar, and Satyam Maheswari. "Survey of recent clustering techniques in data mining." Int J Comput Sci Manag Res 3 (2012): 72-78.
  8. Lior Rokach & Oded Maimon, .CLUSTERINGMETHODS
  9. Ester, Martin, et al. "A density-based algorithm for discovering clusters in large spatial databases with noise." Kdd. Vol. 96. No. 34. 1996.
  10. Ankerst, Mihael, et al. "OPTICS: ordering points to identify the clustering structure." ACM Sigmod Record. Vol. 28. No. 2. ACM, 1999.
  11. Al-Anazi, Sumayia, Hind AlMahmoud, and Isra Al-Turaiki. "Finding Similar Documents UsingDifferentClustering Techniques." ProcediaComputer Science 82 (2016): 28-34.
  12. Hinneburg A., Keim D.: “An Efficient Approach to Clustering in Large Multimedia Databases with Noise”, Proc. 4th Int. Conf. on Knowledge Discovery & Data Mining, New York City, NY, 1998.
  13. Agrawal R., Gehrke J., Gunopulos D., RaghavanP.: “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications”, Proc. ACM SIGMOD’98 Int.Conf. on Management of Data, Seattle, WA, 1998, pp. 94-105
  14. Huang, Zhexue. “Extensions to the k-means algorithm for clustering large data sets with categorical values." Data mining and knowledge discovery 2.3 (1998): 283-304.
  15. Karypis, George, Eui-Hong Han, and Vipin Kumar. "Chameleon: Hierarchical clustering using dynamic modeling." Computer 32.8 (1999): 68-75.
  16. Chiu T, Fang D, Chen J, Wang Y, Jeris C. A robust and scalable clustering algorithm for mixed type attributes in large database environment. In: Proc 2001 Int Conf on Know-ledge Discovery and Data Mining (KDD’01), SanFrancisco, CA; 2001. pp 263–268.
  17. Chris ding and Xiaofeng He (2002), Cluster Merging And Splitting In Hierarchical Clustering Algorithms.
  18. A. Hotho, S. Staab, and G. Stumme. Wordnet improves text document clustering. In Proceedings of the SIGIR Semantic Web Workshop, Toronto, 2003.
  19. Zhao, Ying, George Karypis, and Usama Fayyad."Hierarchical clustering algorithms for document datasets." Data mining and knowledge discovery 10.2 (2005): 141-168.
  20. Arai, Kohei, and Ali Ridho Barakbah. "Hierarchical K-means: an algorithm for centroids initialization for K-means." Reports of the Faculty of Science and Engineering 36.1 (2007): 25-31.
  21. Al-Shboul, Bashar, and Sung-Hyon Myaeng."Initializing k-means using geneticalgorithms."World Academy of Science,Engineering and Technology 54.30 (2009): 114-118.
  22. Eriksson, Brian, et al. "Active Clustering: Robust and Efficient Hierarchical Clustering using Adaptively Selected Similarities." AISTATS. Vol. 8. 2011.
  23. Baridam B, Barilee. More work on K -Means clustering algorithm: The dimensionality problem. International Journal of Computer Applications. 2012; 44(2): 23–30.
  24. Bora, Mr, et al. "Effect of different distancemeasures on the performance of K-means algorithm: an experimental study in Matlab." arXiv preprint arXiv:1405.7471 (2014).
  25. MarjanKuchaki Rafsanjani, Zahra Asghari Varzaneh, Nasibeh Emami Chukanlo (2012), A survey of hierarchical clustering algorithms, The Journal of Mathematics and Computer Science, 5,.3, pp.229- 240.
  26. Bide, P., Shedge, R. Improved Document Clustering using k-means algorithm. In: 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT). 2015, p. 1–5.


Clustering, Hierarchical clustering, Partitional clustering.