Clustering Techniques and the Similarity Measures used in Clustering: A Survey

Jasmine Irani; Nitin Pise; Madhura Phatak

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Clustering Techniques and the Similarity Measures used in Clustering: A Survey

by Jasmine Irani, Nitin Pise, Madhura Phatak

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 134 - Number 7

Year of Publication: 2016

Authors: Jasmine Irani, Nitin Pise, Madhura Phatak

10.5120/ijca2016907841

Jasmine Irani, Nitin Pise, Madhura Phatak . Clustering Techniques and the Similarity Measures used in Clustering: A Survey. International Journal of Computer Applications. 134, 7 ( January 2016), 9-14. DOI=10.5120/ijca2016907841

@article{ 10.5120/ijca2016907841,

author = { Jasmine Irani, Nitin Pise, Madhura Phatak },

title = { Clustering Techniques and the Similarity Measures used in Clustering: A Survey },

journal = { International Journal of Computer Applications },

issue_date = { January 2016 },

volume = { 134 },

number = { 7 },

month = { January },

year = { 2016 },

issn = { 0975-8887 },

pages = { 9-14 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume134/number7/23925-2016907841/ },

doi = { 10.5120/ijca2016907841 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:33:30.375327+05:30

%A Jasmine Irani

%A Nitin Pise

%A Madhura Phatak

%T Clustering Techniques and the Similarity Measures used in Clustering: A Survey

%J International Journal of Computer Applications

%@ 0975-8887

%V 134

%N 7

%P 9-14

%D 2016

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Clustering is an unsupervised learning technique which aims at grouping a set of objects into clusters so that objects in the same clusters should be similar as possible, whereas objects in one cluster should be as dissimilar as possible from objects in other clusters. Cluster analysis aims to group a collection of patterns into clusters based on similarity. A typical clustering technique uses a similarity function for comparing various data items. This paper covers the survey of various clustering techniques, the current similarity measures based on distance based clustering, explains the limitations associated with the existing clustering techniques and propose that the combination of the advantages of the existing systems can help overcome the limitations of the existing systems.

References

S.S. Choi, S.-H. Cha, C. Tappert, A survey of binary similarity and distance measures, Journal of Systematics, Cybernetics and Informatics 8 (1), 2010, 43-48.
Kulkarni, A., Tokekar, V., Kulkarni, P.: Discovering context of labelled text documents using context similarity coefficient. Procedia Computer Science 49C(9),118-127 , Elsevier, 2015.
Haixun Wang , Wei Wang , Jiong Yang , Philip S. Yu , Clustering by Pattern Similarity in Large Data Sets, Proceeding SIGMOD '02 Proceedings of the 2002 ACM SIGMOD international conference on Management of data, Pages 394-405, ACM.
Reinforcement and systemic machine learning for decision making; vol. 1. John Wiley and Sons; 2012., IEEE Press.
Laurent Galluccioa , Olivier Michelb, Pierre Comonb, Mark Kligerc, Alfred O. Herod, Clustering with a new distance measure based on a dual-rooted tree, Information Sciences Volume 251, 1 December 2013, Pages 96-113, Elsevier.
Suphakit Niwattanakul, Jatsada Singthongchai, Ekkachai Naenudorn, Supacha-nun Wanapu, Using of Jaccard Coefficient for Keywords Similarity, Proceedings of the International MultiConference of Engineers and Computer Scientists 2013, Vol I, IMECS 2013, March 13 - 15, 2013, Hong Kong.
Archana Singh, Avantika Yadav, Ajay Rana, K-means with Three different Distance Metrics, International Journal of Computer Applications, Volume 67, No.10, April 2013.
Jian Pei , Xiaoling Zhang , Moonjung Cho , Haixun Wang , Yu, P.S. , MaPle:a fast algorithm for maximal pattern-based clustering, Data Mining, 2003. ICDM 2003. Third IEEE International Conference, Pages 259 - 266.
Anil Kumar Patidar , Jitendra Agrawal , Nishchol Mishra, Analysis of Different Similarity Measure Functions and their Impacts on Shared Nearest Neighbor Clustering Approach, International Journal of Computer Applications, Volume 40, No.16, February 2012.
S. Vijayarani and P. Jothi, "An Efficient Clustering Algorithm for Outlier Detection in Data Streams", International Journal of Advanced Research in Computer and Communication Engineering, vol. 2, Issue 9, (2013) September, pp.3657-3665.
Yadav, A.K. , Tomar, D. , Agarwal, S. , Clustering of lung cancer data using Foggy K-means, Recent Trends in Information Technology (ICRTIT), 2013 International Conference, Pages 13 - 18, IEEE.
Yung-Shen Lin , Jung-Yi Jiang , Shie-Jue Lee , A Similarity Measure for Text Classification and Clustering, Knowledge and Data Engineering, IEEE Transactions (Volume:26 , Issue: 7 ) , Pages 1575 - 1590.
Bollegala D. , Matsuo, Y. , Ishizuka, M. , A Web Search Engine-Based Approach to Measure Semantic Similarity between Words, Knowledge and Data Engineering, IEEE Transactions on (Volume:23 , Issue: 7 ) , Pages 977 - 990.
Botsis T. , Scott, J. , Woo, E.J. , Ball, R. , Identifying Similar Cases in Document Networks Using Cross-Reference Structures, Biomedical and Health Informatics, IEEE Journal of (Volume:19 , Issue: 6 ), Pages 1906 - 1917.
Fuyuan Cao , Jiye Liang , Deyu Li , Liang Baia , Chuangyin Dang , A dissimilarity measure for the k-Modes clustering algorithm, Knowledge-Based Systems, Volume 26, February 2012, Pages 120-127, Elsevier.
Na Chen , Zeshui Xu , Meimei Xia , Correlation coefficients of hesitant fuzzy sets and their applications to clustering analysis, Applied Mathematical Modelling, Volume 37, Issue 4, 15 February 2013, Pages 2197-2211, Elsevier.
Xianchao Zhang , Xiaotong Zhang , Han Liu , Multi-Task Multi-View Clustering for Non-Negative Data, Proceedings of the Twenty-Fourth International Joint Conference on Articial Intelligence, IJCAI 2015.
Gabriella Casalino , Nicoletta Del Buono , Corrado Mencar , Subtractive clustering for seeding non-negative matrix factorizations, Information Sciences, Volume 257, 1 February 2014, Pages 369-387, Elsevier.
Prachi Joshi , Mousami Munot , Parag Kulkarni , Madhuri Joshi , Efficient karyotyping of metaphase chromosomes using incremental learning, IET Science, Measurement and Technology, Volume 7, Issue 5, September 2013, p. 287-295.
Abhishek Kumar , Hal Daume , A Co-training Approach for Multi-view Spectral Clustering, Proceedings of the 28th International Conference on Machine Learning, Bellevue, WA, USA, 2011.
Daniel John Lawson , Daniel Falush , Similarity matrices and clustering algorithms for population identification using genetic data, Department of Mathematics, University of Bristol, Bristol, BS8 1TW, UK, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig Germany, March, 2012.
Wen-Yen Chen , Yangqiu Song , Hongjie Bai , Chih-Jen Lin , Edward Y.Chang , Parallel Spectral Clustering in Distributed Systems, Pattern Analysis and Machine Intelligence, IEEE Transactions on (Volume:33 , Issue: 3 ), 2011, Pages 568-586.
Raman Arora , Maya R. Gupta , Amol Kapila , Maryam Fazel , Similarity-based Clustering by Left-Stochastic Matrix Factorization, Journal of Machine Learning Research 14 (2013) 1715-1746.
D. Kuang, C. Ding, and H. Park. Symmetric nonnegative matrix factorization for graph clustering. In Proc. SIAM Data Mining Conf, 2012.
Cluster analysis: a survey by BS Duran, PL Odell - 2013.
Brian Eriksson , Gautam Dasarathy , Aarti Singh , Robert Nowak , Active Clustering: Robust and Efficient Hierarchical Clustering using Adaptively Selected Similarities, Arxiv preprint arXiv:1102.3887, 2011.
Alina Ene , Sungjin Im , Benjamin Moseley , Fast clustering using MapReduce, Proceeding KDD 2011 Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, Pages 681-689, ACM.
Nir Ailon , Yudong Chen , Huan Xu , Iterative and Active Graph Clustering Using Trace Norm Minimization Without Cluster Size Constraints, Journal of Machine Learning Research 16, 2015, Pages 455-490.
HwanjoCheng, Yizong, Church, George M., 2000. Biclustering of expression data. In: Proc. Eighth Internat. Conf. on Intelligent Systems for Molecular Biology, AAAI Press, pp. 93-103.
Hwanjo Yu , Duane Searsmith , Xiaolei Li , Jiawei Han , Scalable Construction of Topic Directory with Nonparametric Closed Termset Mining, Data Mining, 2004. ICDM '04. Fourth IEEE International Conference, Pages 563-566.
Stutz WE, Bolnick DI. (2014). Stepwise threshold clustering: a new method for genotyping MHC Loci using next-generation sequencing technology, PLoS One 9:e100587.
Zhengxing Huang , Zhejiang Univ. , Hangzhou China , Wei Dong , Hui-long Duan , Haomin Li , Similarity Measure Between Patient Traces for Clinical Pathway Analysis: Problem, Method, and Applications, Biomedical and Health Informatics, IEEE Journal of (Volume:18 , Issue: 1 ), Pages 4-14.
Agrawal R., Faloutsos C., Swami A. Efficient similarity search in sequence databases. Proc. 4 The Int. Conf. On Foundations of Data Organizations and Algorithms, 1993. – Chicago. pp. 69-84.
Fast Distance Metric Based Data Mining Techniques Using P-trees: k-Nearest-Neighbor classification and k-Clustering : A Thesis Submitted to the Graduate Faculty Of the North Dakota State University.
Joaquin Perez Ortega, Ma. Del Rocio Boone Rojas and Maria J. Somodevilla Garcia. Research issues on K-means Algorithm: An Experimental Trial Using Matlab.
Shraddha Pandit, Suchita Gupta, A Comparative Study On Distance Measuring Approaches For Clustering, International Journal of Research in Computer Science eISSN 2249-8265, Volume 2 Issue 1 2011, pp. 29-31.

Index Terms

Computer Science

Information Sciences

Keywords

pattern based similarity negative data clustering similarity measures.