CFP last date
22 December 2025
Call for Paper
January Edition
IJCA solicits high quality original research papers for the upcoming January edition of the journal. The last date of research paper submission is 22 December 2025

Submit your paper
Know more
Random Articles
Reseach Article

Effective Clustering for Large Datasets using Density-Based Clustering via Message Passing

by Siddharth Dixit
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Number 48
Year of Publication: 2025
Authors: Siddharth Dixit
10.5120/ijca2025925809

Siddharth Dixit . Effective Clustering for Large Datasets using Density-Based Clustering via Message Passing. International Journal of Computer Applications. 187, 48 ( Oct 2025), 28-39. DOI=10.5120/ijca2025925809

@article{ 10.5120/ijca2025925809,
author = { Siddharth Dixit },
title = { Effective Clustering for Large Datasets using Density-Based Clustering via Message Passing },
journal = { International Journal of Computer Applications },
issue_date = { Oct 2025 },
volume = { 187 },
number = { 48 },
month = { Oct },
year = { 2025 },
issn = { 0975-8887 },
pages = { 28-39 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume187/number48/effective-clustering-for-large-datasets-using-density-based-clustering-via-message-passing/ },
doi = { 10.5120/ijca2025925809 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2025-10-23T00:18:16+05:30
%A Siddharth Dixit
%T Effective Clustering for Large Datasets using Density-Based Clustering via Message Passing
%J International Journal of Computer Applications
%@ 0975-8887
%V 187
%N 48
%P 28-39
%D 2025
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Density-based clustering remains a significant area of research in data science, particularly given the increasing prevalence of high-dimensional datasets with varying densities. Many existing clustering approaches struggle to effectively handle datasets that contain regions of high density surrounded by sparse areas. This study introduces a novel clustering algorithm based on the concept of mutual K-nearest neighbor relationships, designed to overcome these limitations. The proposed method requires only a single input parameter, demonstrates strong performance on high-dimensional, density-based datasets, and is computationally efficient. Furthermore, the algorithm’s practical applications are illustrated through its potential to enhance search and retrieval processes within vector databases.

References
  1. X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Mo- toda, G. J. McLachlan, et al., “Top 10 algorithms in data mining,” Knowledge and Information Systems, vol. 14, no. 1, pp. 1–37, 2008.
  2. P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. Library of Congress, 2006.
  3. P. Tan, M. Steinbach, and V. Kumar, “Data mining cluster analysis: Basic concepts and algorithms,” 2013.
  4. Z. Hu and R. Bhatnagar, “Clustering algorithm based on mutual k-nearest neighbor relationships,” Statistical Analysis and Data Mining: The ASA Data Science Journal, vol. 5, no. 2, pp. 100–145, 2012.
  5. D. Sardana and R. Bhatnagar, “Graph clustering using mutual k-nearest neighbors,” in Active Media Technology, pp. 35–48, Springer International Publishing, 2014.
  6. L. Ertoz, M. Steinbach, and V. Kumar, “A new shared nearest neighbor clustering algorithm and its applications,” in Workshop on Clustering High Dimensional Data and its Applications at 2nd SIAM International Conference on Data Mining, pp. 105–115, Apr. 2002.
  7. M. A. Wong and T. Lane, “A kth nearest neighbour clustering procedure,” in Computer Science and Statistics: Proceedings of the 13th Symposium on the Interface, pp. 308–311, Springer US, Jan. 1981.
  8. H. Kriegel et al., “Density-based clustering,” Wiley Interdis- ciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, no. 3, pp. 231–240, 2011.
  9. L. Ertöz, M. Steinbach, and V. Kumar, “Finding clusters of different sizes, shapes, and densities in noisy, high dimen- sional data,” in SDM, 2003.
  10. C. C. Aggarwal, A. Hinneburg, and D. A. Keim, On the surprising behavior of distance metrics in high dimensional space. Springer Berlin Heidelberg, 2001.
  11. B. J. Frey and D. Dueck, “Clustering by passing messages between data points,” Science, vol. 315, no. 5814, pp. 972–976, 2007.
  12. Z. Hu, Multi-Domain Clustering on Real-Valued Datasets. PhD thesis, University of Cincinnati, 2011. https://etd. ohiolink.edu/.
  13. M. Steinbach, G. Karypis, and V. Kumar, “A comparison of document clustering techniques,” in KDD Workshop on Text Mining, vol. 400, 2000.
Index Terms

Computer Science
Information Sciences

Keywords

Clustering; Mutual 𝑘-Nearest Neighbor; Density- Based Methods; Outlier Detection; Vector Databases; Data Mining