CFP last date
22 April 2024
Reseach Article

Knowledge Assisted Visualization for Imbalanced Data Clustering

Published on February 2013 by P. Alagambigai, K. Thangavel
International Conference on Communication, Computing and Information Technology
Foundation of Computer Science USA
ICCCMIT - Number 2
February 2013
Authors: P. Alagambigai, K. Thangavel
dcf07ed7-74c5-4ecd-98cd-38faef0a3b05

P. Alagambigai, K. Thangavel . Knowledge Assisted Visualization for Imbalanced Data Clustering. International Conference on Communication, Computing and Information Technology. ICCCMIT, 2 (February 2013), 6-13.

@article{
author = { P. Alagambigai, K. Thangavel },
title = { Knowledge Assisted Visualization for Imbalanced Data Clustering },
journal = { International Conference on Communication, Computing and Information Technology },
issue_date = { February 2013 },
volume = { ICCCMIT },
number = { 2 },
month = { February },
year = { 2013 },
issn = 0975-8887,
pages = { 6-13 },
numpages = 8,
url = { /specialissues/icccmit/number2/10330-1014/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Special Issue Article
%1 International Conference on Communication, Computing and Information Technology
%A P. Alagambigai
%A K. Thangavel
%T Knowledge Assisted Visualization for Imbalanced Data Clustering
%J International Conference on Communication, Computing and Information Technology
%@ 0975-8887
%V ICCCMIT
%N 2
%P 6-13
%D 2013
%I International Journal of Computer Applications
Abstract

The common challenge which is faced by much of the data clustering techniques is data complexity, which leads to many issues such as overlapping, lack of representative data and class imbalance. This may deteriorates the clustering process. The situation gets worse when the class imbalance is very high. To cluster such imbalanced data sets, better understandings of the dataset and efficient clustering algorithms are required. This could be achieved by integrating suitable domain intelligence into the clustering process. In this paper, Knowledge Assisted Visualization framework is proposed for imbalanced data clustering and validation. The proposed Knowledge Assisted Visualization framework integrates an efficient visual clustering framework with suitable domain intelligence acquired from domain experts and users into clustering process. An experimental analysis is carried out over a wide range of highly imbalanced data sets. Experiments demonstrate that the proposed method works well with imbalanced dataset and eases the cluster identification and validation in an effective way.

References
  1. Alagambigai, P. , Thangavel, K. , "Visual Clustering through Weight Entropy," International Journal on Data Mining, Modelling and Management, Vol. 2(3), pp. 196-215, 2010.
  2. Alagambigai, P. , Thangavel, K. , Karthikeyani Vishalakshi, N, "Entropy Weighting Feature Selection for Interactive Visual Clustering," In: Proceedings of 4th International Conference on Artificial Intelligence, pp. 545-557, 2009.
  3. Ankerst M. , Breunig M. , Kriegel H. P. , Sander J. ,"OPTICS: Ordering Points To Identify the Clustering Structure," In: Proceedings of ACM SIGMOD '99, International Conference on Management of Data, Philadelphia, pp. 49-60, 1999.
  4. Ashok Kumar, "Intelligent Partitional Clustering," Ph. D Thesis, Gandhigram Rural University, Gandhigram, Tamil Nadu, India, 2007.
  5. Barbara. D. , Chen. P. , "Using the fractal dimension to cluster dataset", KDD'00 proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 260- 264.
  6. Chen K. , Liu L. , ''VISTA: Validating and Refining Clusters via Visualization," Information Visualization, Vol. 3(4), pp. 257-270, 2004.
  7. Chen M. , Ebert D. , Hagen H. , Laramee R. S. , Van Liere R. , Ma K. , Ribarsky W. , Scheuermann G. , Silver D. , "Data Information, and Knowledge in Visualization," IEEE Computer Graphics and Applications, Vol. 29(1), pp. 12-19, 2009.
  8. Domeniconi, . C, Papadopoulos, P. , Gunopulos, D. , Ma, S. ,"Subspace Clustering of High Dimensional Data. Proc. SIAM Int'l Conf. Data Mining, 2004.
  9. Doucette J. , Heywood M. I. , "GP Classification under Imbalanced Data Sets: Active Sub-Sampling AUC Approximation," LNCS, Vol. 4971, pp. 266-277, 2008.
  10. Estabrooks A. , Jo T. , Japkowicz N. , "A Multiple Resampling Method for Learning from Imbalanced Datasets," Computational Intelligence, Vol. 20(1), pp. 18-36, 2004.
  11. Fernandez A. , del Jesus M. J. , Herrera F. , "Multi-class Imbalanced Datasets with Linguistic Fuzzy Rule based Classification systems based on Pairwise Learning," Computational Intelligence for Knowledge-Based Systems Design, LNCS, Vol. 6178/2010, pp. 89-98, 2010.
  12. Fernandez A. , del Jesus M. J. , Herrera F. , "On the Influence of an Adaptive Inference System in Fuzzy Rule Based Classification Systems for Imbalanced Datasets," Expert Systems with Applications, Vol. 36, pp. 9805 -9812, 2009.
  13. He H. , Garcia E. A. , "Learning from Imbalanced Data," IEEE Transactions on Knowledge and Data Engineering, Vol. 21(9), pp. 1263-1284, September 2009.
  14. Jain, A. K. , Murty, M. N. , Flynn, P. J. , "Data Clustering : A Review", ACM Computing Surveys, (1999).
  15. Jeatrakul P. , Wong K. W. , Fung C. C. , Takama Y. , "Misclassification Analysis for the Class Imbalance Problem," World Automation Congress (WAC) 2010, pp. 1-6, Sept 19-23, 2010.
  16. Jing L. , Michael Ng K. , Huang J. Z, "An Entropy Weighting K-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data," IEEE Transactions on Knowledge and Data Engineering, Vol. 19(8), pp. 1026-1041, 2007.
  17. Kandogan E. , "Star Coordinates: A Multi-dimensional Visualization Technique with Uniform Treatment of Dimensions," IEEE Symposium on Information Visualization, Salt Lake City, Utah, pp. 4-8, 2000.
  18. Kandogan E. , "Visualizing Multi-dimensional Clusters, Trends and outliers using star Co-ordinates," In: Proceedings of ACM KDD, 2001.
  19. Keim D. A, Hans-Peter, Kriegel, "Visualization Techniques for Mining Large Databases: A Comparison," IEEE Transactions on Knowledge and Data Engineering, Vol. 8(6), pp. 923-938, 1996.
  20. Keim, D. A. , "Information Visualization and Visual Data Mining," IEEE Transactions on Visualization and Computer Graphics, Vol. 7(1), pp. 1-8, 2002.
  21. Klement W. , Wilk S. , Michalowski M. , Matwin S. , "Classifying Severely Imbalanced Data," Advances in Artificial Intelligence, LNCS, Vol. 6657, pp. 258-264, 2011.
  22. Liu Y. , An A. , Huang X. , "Boosting Prediction Accuracy on Imbalanced Data Sets with SVM Ensembles," LNAI, Vol. 3918, pp. 107-118, 2006.
  23. Marie desJardins, James MacGlashan, Julia Ferraioli. : "Interactive visual clustering. Intelligent User Interfaces" , pp. 361-364, (2007).
  24. Sourina O. , Liu D. , "Visual Interactive 3-Dimensional Clustering With Implicit Functions," In: Proceedings of the IEEE Conference on Cybernetics and Intelligent Systems, Vol. 1, pp. 382-386, 1-3 December 2004.
  25. Wang. C. , Ma. K. , "Information and Knowledge assisted analysis and visualization of large-scale data", Proceedings of Ultrascale Visualization, 2008, UltraVis 2008.
  26. Zhang K. B. , "Visual Cluster Analysis in Data Mining", Ph. D, Thesis, Department of Computing, Division of Information and Communication Sciences Macquarie University, NSW 2109, Australia, 2007.
  27. Zhang K. B. , Orgun M. A. , Zhang K. , "HOV3: An Approach for Visual Cluster Analysis," In: Proceedings of the 2nd International Conference on Advanced Data Mining and Applications (ADMA 2006), Xian, China, LNCS, Springer Press, Vol. 4093, pp. 316-327, August 14-16, 2006.
Index Terms

Computer Science
Information Sciences

Keywords

Data Mining Class Imbalance Interactive Clustering Knowledge Assisted Visualization Visual Clustering