CFP last date
20 May 2024
Reseach Article

Cluster based Outlier Detection

by Pranjali Kasture, Jayant Gadge
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 58 - Number 10
Year of Publication: 2012
Authors: Pranjali Kasture, Jayant Gadge
10.5120/9317-3549

Pranjali Kasture, Jayant Gadge . Cluster based Outlier Detection. International Journal of Computer Applications. 58, 10 ( November 2012), 11-15. DOI=10.5120/9317-3549

@article{ 10.5120/9317-3549,
author = { Pranjali Kasture, Jayant Gadge },
title = { Cluster based Outlier Detection },
journal = { International Journal of Computer Applications },
issue_date = { November 2012 },
volume = { 58 },
number = { 10 },
month = { November },
year = { 2012 },
issn = { 0975-8887 },
pages = { 11-15 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume58/number10/9317-3549/ },
doi = { 10.5120/9317-3549 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:02:05.133151+05:30
%A Pranjali Kasture
%A Jayant Gadge
%T Cluster based Outlier Detection
%J International Journal of Computer Applications
%@ 0975-8887
%V 58
%N 10
%P 11-15
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Outlier detection is a fundamental issue in data mining, specifically it has been used to detect and remove anomalous objects from data. mining. The proposed approach to detect outlier includes three methods which are clustering, pruning and computing outlier score. For clustering k-means algorithm is used which partition the dataset into given number of clusters. In pruning, based on some distance measure, points which are closed to centroid of each cluster are pruned. For the unpruned points, local distance based outlier factor (LDOF) measure is calculated. A measure called LDOF, tells how much a point is deviating from its neighbors. The high LDOF value of a point indicates that the point is deviating more from its neighbors and probably it may be an outlier.

References
  1. Rajendra Pamula, Jatindra Kumar Deka, Sukumar Nandi. Distance based Fast Outlier Detection Method. 2010, Annual IEEE, India Conference (INDICON).
  2. K. Zhang, M. Hutter, and H. Jin. A new local distance- based outlier detection approach for scattered real-world data. In PAKDD '09: Proceedings of the 13th Pacific- Asia Conference on Advances in Knowledge Discovery and Data Mining, pages 813–822, 2009.
  3. Hans-Peter Kriegel, Peer Kröger, Erich Schubert, Arthur Zimek. LoOP: Local Outlier Probabilities. CIKM'09, November 2–6, 2009, Hong Kong, China. Copyright 2009 ACM pages 1649-1652, 2009
  4. E. M. Knorr and R. T. Ng. Algorithms for mining distance based outliers in large datasets. In Proc. 24th Int. Conf. Very Large Data Bases, VLDB, pages 392–403, 1998.
  5. F. Angiulli, S. Basta, and C. Pizzuti. Distance-based detection and prediction of outliers. IEEE Transactions on Knowledge and Data Engineering, 18:145–160, 2006.
  6. M. M. Breunig, H. -P. Kriegel, R. T. Ng, and J. Sander. Lof: identifying density-based local outliers. SIGMOD Rec. , 29(2):93–104, 2000
  7. M. Ester, H. -P. Kriegel, and X. Xu. A database interface for clustering in large spatial databases. In Proceedings of 1st International Conference on Knowledge Discovery and Data Mining (KDD-95), 1995
  8. S. Guha, R. Rastogi, and K. Shim. CURE: An efficient clus tering algorithm for large databases. SIGMOD Rec. , 27(2):73–84, 1998. Sannella, M. J. 1994 Constraint Satisfaction and Debugging for Interactive User Interfaces. Doctoral Thesis. UMI Order Number: UMI Order No. GAX95-09398. , University of Washington.
  9. S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. pages 427–438, 2000
  10. T. Zhang, R. Ramakrishnan, and M. Livny. Birch: an efficient data clustering method for very large databases. SIGMOD Rec. , 25(2):103–114, 1996.
  11. A. M. Fahim, G. Saake, A. M. Salem, F. A. Torkey, and M. A. Ramadan: K-Means for Spherical Clusters with Large variance in Sizes, World Academy of Science, Engineering and technology 45 2008
Index Terms

Computer Science
Information Sciences

Keywords

Outlier cluster pruning outlier score k nearest neighbor