CFP last date
20 May 2024
Reseach Article

Clustering Multi-Attribute Uncertain Data using Probability Distribution

by Kulkarni V.v, Bag .v.v
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 102 - Number 5
Year of Publication: 2014
Authors: Kulkarni V.v, Bag .v.v
10.5120/17812-8641

Kulkarni V.v, Bag .v.v . Clustering Multi-Attribute Uncertain Data using Probability Distribution. International Journal of Computer Applications. 102, 5 ( September 2014), 28-32. DOI=10.5120/17812-8641

@article{ 10.5120/17812-8641,
author = { Kulkarni V.v, Bag .v.v },
title = { Clustering Multi-Attribute Uncertain Data using Probability Distribution },
journal = { International Journal of Computer Applications },
issue_date = { September 2014 },
volume = { 102 },
number = { 5 },
month = { September },
year = { 2014 },
issn = { 0975-8887 },
pages = { 28-32 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume102/number5/17812-8641/ },
doi = { 10.5120/17812-8641 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:32:20.733186+05:30
%A Kulkarni V.v
%A Bag .v.v
%T Clustering Multi-Attribute Uncertain Data using Probability Distribution
%J International Journal of Computer Applications
%@ 0975-8887
%V 102
%N 5
%P 28-32
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Clustering is an unsupervised classification technique for grouping set of abstract objects into classes of similar objects. Clustering uncertain data is one of the essential tasks in mining uncertain data. Uncertain data is typically found in the area of sensor networks, weather data, customer rating data etc. The earlier methods for clustering uncertain data based on probability distribution, uses Kullback-Leibler divergence to measure similarity between two uncertain objects. In this paper, uncertain object in discrete domain is modeled, where uncertain object is treated as a discrete random variable. The Jenson-Shannon divergence is used to measure the similarity between two uncertain objects and integrate it into partitioning and density based clustering approaches. Experiments are performed to verify the effectiveness and efficiency of model developed and results are at par with the existing approaches.

References
  1. R. Cheng, D. V. Kalashnikov, and S. Prabhakar. Evaluating probabilistic queries over imprecise data. In SIGMOD, 2003.
  2. N. N. Dalvi and D. Suciu. Management of probabilistic data: foundations and challenges. In PODS, 2007.
  3. A. K. JAIN Michigan State University, M. N. MURTHY Indian Institute of Science and P. J. FLYNN The Ohio State University "Data Clustering: A Review".
  4. Jiawei Han, Micheline Kamber "Data Mining Concepts and Technique".
  5. J. Pei, B. Jiang, X. Lin and Y. Yuan "Probabilistic skylines on uncertain data". In VLDB, 2007.
  6. WangKayNgai, Ben Kao, ChunKitChui, Reynolds Cheng, Michael Chau, KevinY. Yip "Efficient Clustering Of Uncertain Data". In ICDM, 2005.
  7. B. Kao, S. D. Lee, D. W. Cheung, W. -S. Ho and K. F. Chan. Clustering uncertain data using voronoi diagrams. In ICDM, 2008.
  8. Hans-Peter Kriegel, Martin Pfeifle "Density Based Clustering of Uncertain Data". In KDD 2005.
  9. H. P. Kriegel and M. Pfeifle. Hierarchical density-based clustering of uncertain data. In ICDM, 2005.
  10. A. Banerjee, S. Mergu, I. S. Dhillion, and J. Ghosh "Clustering Using Bregman Divergences". Journal of Machine Learning Research, 2003.
  11. Bin Jiang, Jian Pei, Yufei Tao and Xuemin Lin "Clustering Uncertain Data Based On Probability Distribution Similarity". IEEE Transaction on nowledge and Data Engineering, 2013.
Index Terms

Computer Science
Information Sciences

Keywords

Clustering Uncertain Data Discrete Domain Multi-Attribute Data