Call for Paper - August 2022 Edition
IJCA solicits original research papers for the August 2022 Edition. Last date of manuscript submission is July 20, 2022. Read More

Knowledge Assisted Visualization for Imbalanced Data Clustering

Print
PDF
IJCA Special Issue on International Conference on Communication, Computing and Information Technology
© 2013 by IJCA Journal
ICCCMIT - Number 2
Year of Publication: 2013
Authors:
P. Alagambigai
K. Thangavel

P Alagambigai and K Thangavel. Article: Knowledge Assisted Visualization for Imbalanced Data Clustering. IJCA Special Issue on International Conference on Communication, Computing and Information Technology ICCCMIT(2):6-13, February 2013. Full text available. BibTeX

@article{key:article,
	author = {P. Alagambigai and K. Thangavel},
	title = {Article: Knowledge Assisted Visualization for Imbalanced Data Clustering},
	journal = {IJCA Special Issue on International Conference on Communication, Computing and Information Technology},
	year = {2013},
	volume = {ICCCMIT},
	number = {2},
	pages = {6-13},
	month = {February},
	note = {Full text available}
}

Abstract

The common challenge which is faced by much of the data clustering techniques is data complexity, which leads to many issues such as overlapping, lack of representative data and class imbalance. This may deteriorates the clustering process. The situation gets worse when the class imbalance is very high. To cluster such imbalanced data sets, better understandings of the dataset and efficient clustering algorithms are required. This could be achieved by integrating suitable domain intelligence into the clustering process. In this paper, Knowledge Assisted Visualization framework is proposed for imbalanced data clustering and validation. The proposed Knowledge Assisted Visualization framework integrates an efficient visual clustering framework with suitable domain intelligence acquired from domain experts and users into clustering process. An experimental analysis is carried out over a wide range of highly imbalanced data sets. Experiments demonstrate that the proposed method works well with imbalanced dataset and eases the cluster identification and validation in an effective way.

References

  • Alagambigai, P. , Thangavel, K. , "Visual Clustering through Weight Entropy," International Journal on Data Mining, Modelling and Management, Vol. 2(3), pp. 196-215, 2010.
  • Alagambigai, P. , Thangavel, K. , Karthikeyani Vishalakshi, N, "Entropy Weighting Feature Selection for Interactive Visual Clustering," In: Proceedings of 4th International Conference on Artificial Intelligence, pp. 545-557, 2009.
  • Ankerst M. , Breunig M. , Kriegel H. P. , Sander J. ,"OPTICS: Ordering Points To Identify the Clustering Structure," In: Proceedings of ACM SIGMOD '99, International Conference on Management of Data, Philadelphia, pp. 49-60, 1999.
  • Ashok Kumar, "Intelligent Partitional Clustering," Ph. D Thesis, Gandhigram Rural University, Gandhigram, Tamil Nadu, India, 2007.
  • Barbara. D. , Chen. P. , "Using the fractal dimension to cluster dataset", KDD'00 proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 260- 264.
  • Chen K. , Liu L. , ''VISTA: Validating and Refining Clusters via Visualization," Information Visualization, Vol. 3(4), pp. 257-270, 2004.
  • Chen M. , Ebert D. , Hagen H. , Laramee R. S. , Van Liere R. , Ma K. , Ribarsky W. , Scheuermann G. , Silver D. , "Data Information, and Knowledge in Visualization," IEEE Computer Graphics and Applications, Vol. 29(1), pp. 12-19, 2009.
  • Domeniconi, . C, Papadopoulos, P. , Gunopulos, D. , Ma, S. ,"Subspace Clustering of High Dimensional Data. Proc. SIAM Int'l Conf. Data Mining, 2004.
  • Doucette J. , Heywood M. I. , "GP Classification under Imbalanced Data Sets: Active Sub-Sampling AUC Approximation," LNCS, Vol. 4971, pp. 266-277, 2008.
  • Estabrooks A. , Jo T. , Japkowicz N. , "A Multiple Resampling Method for Learning from Imbalanced Datasets," Computational Intelligence, Vol. 20(1), pp. 18-36, 2004.
  • Fernandez A. , del Jesus M. J. , Herrera F. , "Multi-class Imbalanced Datasets with Linguistic Fuzzy Rule based Classification systems based on Pairwise Learning," Computational Intelligence for Knowledge-Based Systems Design, LNCS, Vol. 6178/2010, pp. 89-98, 2010.
  • Fernandez A. , del Jesus M. J. , Herrera F. , "On the Influence of an Adaptive Inference System in Fuzzy Rule Based Classification Systems for Imbalanced Datasets," Expert Systems with Applications, Vol. 36, pp. 9805 -9812, 2009.
  • He H. , Garcia E. A. , "Learning from Imbalanced Data," IEEE Transactions on Knowledge and Data Engineering, Vol. 21(9), pp. 1263-1284, September 2009.
  • Jain, A. K. , Murty, M. N. , Flynn, P. J. , "Data Clustering : A Review", ACM Computing Surveys, (1999).
  • Jeatrakul P. , Wong K. W. , Fung C. C. , Takama Y. , "Misclassification Analysis for the Class Imbalance Problem," World Automation Congress (WAC) 2010, pp. 1-6, Sept 19-23, 2010.
  • Jing L. , Michael Ng K. , Huang J. Z, "An Entropy Weighting K-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data," IEEE Transactions on Knowledge and Data Engineering, Vol. 19(8), pp. 1026-1041, 2007.
  • Kandogan E. , "Star Coordinates: A Multi-dimensional Visualization Technique with Uniform Treatment of Dimensions," IEEE Symposium on Information Visualization, Salt Lake City, Utah, pp. 4-8, 2000.
  • Kandogan E. , "Visualizing Multi-dimensional Clusters, Trends and outliers using star Co-ordinates," In: Proceedings of ACM KDD, 2001.
  • Keim D. A, Hans-Peter, Kriegel, "Visualization Techniques for Mining Large Databases: A Comparison," IEEE Transactions on Knowledge and Data Engineering, Vol. 8(6), pp. 923-938, 1996.
  • Keim, D. A. , "Information Visualization and Visual Data Mining," IEEE Transactions on Visualization and Computer Graphics, Vol. 7(1), pp. 1-8, 2002.
  • Klement W. , Wilk S. , Michalowski M. , Matwin S. , "Classifying Severely Imbalanced Data," Advances in Artificial Intelligence, LNCS, Vol. 6657, pp. 258-264, 2011.
  • Liu Y. , An A. , Huang X. , "Boosting Prediction Accuracy on Imbalanced Data Sets with SVM Ensembles," LNAI, Vol. 3918, pp. 107-118, 2006.
  • Marie desJardins, James MacGlashan, Julia Ferraioli. : "Interactive visual clustering. Intelligent User Interfaces" , pp. 361-364, (2007).
  • Sourina O. , Liu D. , "Visual Interactive 3-Dimensional Clustering With Implicit Functions," In: Proceedings of the IEEE Conference on Cybernetics and Intelligent Systems, Vol. 1, pp. 382-386, 1-3 December 2004.
  • Wang. C. , Ma. K. , "Information and Knowledge assisted analysis and visualization of large-scale data", Proceedings of Ultrascale Visualization, 2008, UltraVis 2008.
  • Zhang K. B. , "Visual Cluster Analysis in Data Mining", Ph. D, Thesis, Department of Computing, Division of Information and Communication Sciences Macquarie University, NSW 2109, Australia, 2007.
  • Zhang K. B. , Orgun M. A. , Zhang K. , "HOV3: An Approach for Visual Cluster Analysis," In: Proceedings of the 2nd International Conference on Advanced Data Mining and Applications (ADMA 2006), Xian, China, LNCS, Springer Press, Vol. 4093, pp. 316-327, August 14-16, 2006.