CFP last date
20 May 2024
Reseach Article

RCRDE: A Method for Reducing the Rate of Re-Clustering, using Replicated Data Eliminate Algorithm

by Fateme Rashidi, Arash Ghorbannia Delavar, Fateme Heidari Soureshjani, Ali Broumandnia
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 69 - Number 25
Year of Publication: 2013
Authors: Fateme Rashidi, Arash Ghorbannia Delavar, Fateme Heidari Soureshjani, Ali Broumandnia
10.5120/12128-8472

Fateme Rashidi, Arash Ghorbannia Delavar, Fateme Heidari Soureshjani, Ali Broumandnia . RCRDE: A Method for Reducing the Rate of Re-Clustering, using Replicated Data Eliminate Algorithm. International Journal of Computer Applications. 69, 25 ( May 2013), 13-20. DOI=10.5120/12128-8472

@article{ 10.5120/12128-8472,
author = { Fateme Rashidi, Arash Ghorbannia Delavar, Fateme Heidari Soureshjani, Ali Broumandnia },
title = { RCRDE: A Method for Reducing the Rate of Re-Clustering, using Replicated Data Eliminate Algorithm },
journal = { International Journal of Computer Applications },
issue_date = { May 2013 },
volume = { 69 },
number = { 25 },
month = { May },
year = { 2013 },
issn = { 0975-8887 },
pages = { 13-20 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume69/number25/12128-8472/ },
doi = { 10.5120/12128-8472 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:31:17.046902+05:30
%A Fateme Rashidi
%A Arash Ghorbannia Delavar
%A Fateme Heidari Soureshjani
%A Ali Broumandnia
%T RCRDE: A Method for Reducing the Rate of Re-Clustering, using Replicated Data Eliminate Algorithm
%J International Journal of Computer Applications
%@ 0975-8887
%V 69
%N 25
%P 13-20
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In this paper is explored a way to reduce therate of re-clustering andspeed uptheclusteringprocess oncategoricaltime-evolving data. This method introducestwoalgorithmsRDE (Replicated Data Elimination) andRCRDE. The RDEalgorithmremoves the successivesurveysof replicated dataandconsiders counters tokeepthis data. Hence the number of created windows via thesliding window techniqueis limited and thisleads todecrease thenumber ofimplementations ofclusteringalgorithm. The RCRDEalgorithmbased on MARDL (MAximal Resemblance Data Labeling) framework decidesabout re-clustering implementation ormodificationofpreviousclusteringresults. Thepresentedmethodisindependent of clusteringalgorithm'stype and any kind ofcategoricalclusteringalgorithmcan be used. According tothe results obtainedondifferentdata sets,this method performs well in practice and facilitatestheclustering implementationon categorical data. Also, this method can be utilized to cluster a very large categorical static databasewith higher quality than previous work.

References
  1. C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, "A framework for clustering evolving data streams," Proceedings of the 29th International Conference Very Large Data Bases (VLDB), Sep. 2003, pp. 81–92.
  2. G. Widmer and M. Kubat, "Learning in the Presence of Concept Drift and Hidden Contexts," Machine Learning, April. 1996, pp. 69 – 101.
  3. H. Wang, W. Fan, P. S. Yun, J. Han, "Mining Concept-Drifting Data Streams Using Ensemble Classifiers," Proceedings of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp . 226 – 235.
  4. M. M. Gaber , P. S. Yu, "Detection and Classification of Changes in Evolving Data Streams," International Journal of Information Technology and Decision Making, 2006, pp. 659-670.
  5. H. -L. Chen, M. -S. Chen, S. -C. Lin, "Catching the Trend: A Framework for Clustering Concept-Drifting Categorical Data," IEEE Transactions on Knowledge and Data Engineering, May. 2009, pp. 652-665.
  6. F. Cao, M. Ester, W. Qian, A. Zhou, "Density-Based Clustering over an Evolving Data Stream with Noise," Proceedings of the 6th SIAM Conference on Data Mining (SDM), 2006, pp. 326-337
  7. D. Chakrabarti, R. Kumar, and A. Tomkins, "Evolutionary Clustering," Proc. 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006, pp. 554-560.
  8. E. H. Han, G. Karypis, V. Kumar, B. Mobasher, "Clustering Based on Association Rule Hyper graphs," Proceedings of ACM SIGMOD Workshop Research Issues in Data Mining and Knowledge Discovery (DMKD), 1997.
  9. D. Gibson, J. M. Kleinberg, P. Raghavan, "Clustering Categorical Data: An Approach Based on Dynamical Systems," VLDB Journal ,vol. 8, nos. 3-4, 2000, pp. 222-236.
  10. Z. Huang, "Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values," Data Mining and Knowledge Discovery, 1998, pp. 283-304.
  11. Z. Huang, M. K. Ng, "A Fuzzy k-Modes Algorithm for Clustering Categorical Data," IEEE Transactions on Fuzzy Systems, 1999, pp. 446 – 452.
  12. Y. Sun, Q. Zhu, and Z. Chen, "An Iterative Initial-Points Refinement Algorithm for Categorical Data Clustering," Pattern Recognition Letters, vol. 23, no. 7, May. 2002, pp. 875–884.
  13. S. Guha, R. Rastogi, and K. Shim, "ROCK: A Robust ClusteringAlgorithm for Categorical Attributes," Proceedings of the 15th international conference on Data Engineering (ICDE), 2000, pp. 345-366.
  14. V. Ganti, J. Gehrke, R. Ramakrishnan, "CACTUS-Clustering Categorical Data Using Summaries," Proccedings of the ACM SIGKDD international conference on Knowledge discovery and data mining,1999, pp. 73-83.
  15. M. J. Zaki and M. Peters, "Clicks: Mining Subspace Clusters in Categorical Data via k-Partite Maximal Cliques," Proceedings of the 21st international conference on Data Engineering, April. 2005.
  16. D. Barbara, Y. Li, J. Couto, "Coolcat: An Entropy-Based Algorithm for Categorical Clustering," Proceedings of the eleventh International conference on Information and knowledge management (CIKM), 2002, pp. 582-589.
  17. P. Andritsos, P. Tsaparas, R. J. Miller, K. C. Sevcik, "Limbo: Scalable Clustering of Categorical Data," Proceedings of the 9th International Conference on Extending Database Technology (EDBT), March. 2004, pp. 123-146.
  18. H. -L. Chen, K. -T. Chuang, M. -S. Chen, "On Data Labeling for Clustering Categorical Data," IEEE Transactions on Knowledge and Data Engineering , November. 2008, pp. 1458-1471.
  19. A. Zhou, F. Cao, W. Qian, C. Jin, "Tracking Clusters in Evolving Data Streams over Sliding Windows," J. Knowledge and Information Systems, May. 2008, pp. 181-214.
  20. H. -L. Chen, K. -T. Chuang, M. -S. Chen, "Labeling Unclustered Categorical Data into Clusters Based on the Important Attribute Values," Proc. 15th IEEE International Conference on Data Mining (ICDM), 2005, pp. 27-30.
  21. A. P. Dempster, N. M. Laird, D. B. Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm," J. Royal Statistical Soc. , 1977, pp . 1-38.
  22. B. -R. Dai, J. -W. Huang, M. -Y. Yeh, M. -S. Chen, "Adaptive Clustering for Multiple Evolving Streams," IEEE Transactions on Knowledge and Data Engineering, Sept. 2006, pp. 1166-1180.
  23. O. Nasraoui, C. Rojas, "Robust Clustering for Tracking Noisy Evolving Data Streams," Proceedings of the 6th SIAM Conference on Data Mining (SDM), 2006, pp . 618-622.
  24. Y. Chi, X. Song, D. Zhou, K. Hino, B. L. Tseng, "Evolutionary Spectral Clustering by Incorporating Temporal Smoothness," Proc. 13th ACM SIGKDD international conference on Knowledge discovery and data mining, 2007, pp. 153-162.
  25. M. -Y. Yeh, B. -R. Dai, M. -S. Chen, "Clustering over Multiple Evolving Streams by Events and Correlations," IEEE Transactions on Knowledge and Data Engineering, Oct. 2007, pp. 1349-1362.
Index Terms

Computer Science
Information Sciences

Keywords

Categorical time-evolving data clustering data labeling drifting-concept detecting