CFP last date
20 May 2024
Reseach Article

Outlier Detection using Improved Genetic K-means

by M. H. Marghny, Ahmed I. Taloba
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 28 - Number 11
Year of Publication: 2011
Authors: M. H. Marghny, Ahmed I. Taloba
10.5120/3458-4723

M. H. Marghny, Ahmed I. Taloba . Outlier Detection using Improved Genetic K-means. International Journal of Computer Applications. 28, 11 ( August 2011), 33-36. DOI=10.5120/3458-4723

@article{ 10.5120/3458-4723,
author = { M. H. Marghny, Ahmed I. Taloba },
title = { Outlier Detection using Improved Genetic K-means },
journal = { International Journal of Computer Applications },
issue_date = { August 2011 },
volume = { 28 },
number = { 11 },
month = { August },
year = { 2011 },
issn = { 0975-8887 },
pages = { 33-36 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume28/number11/3458-4723/ },
doi = { 10.5120/3458-4723 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:14:33.244841+05:30
%A M. H. Marghny
%A Ahmed I. Taloba
%T Outlier Detection using Improved Genetic K-means
%J International Journal of Computer Applications
%@ 0975-8887
%V 28
%N 11
%P 33-36
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The outlier detection problem in some cases is similar to the classification problem. For example, the main concern of clustering-based outlier detection algorithms is to find clusters and outliers, which are often regarded as noise that should be removed in order to make more reliable clustering. In this article, we present an algorithm that provides outlier detection and data clustering simultaneously. The algorithmimprovesthe estimation of centroids of the generative distribution during the process of clustering and outlier discovery. The proposed algorithm consists of two stages. The first stage consists of improved genetic k-means algorithm (IGK) process, while the second stage iteratively removes the vectors which are far from their cluster centroids.

References
  1. Williams, G., Baxter, R., He, H., Hawkins, S., and Gu, L.2002. A Comparative Study for RNN for Outlier Detection in Data Mining. In Proceedings of the 2nd IEEE International Conference on Data Mining, Maebashi City, Japan, pp.709.
  2. He,Z., Xu, X., and Deng,S. 2003. Discovering Cluster-based Local Outliers. Pattern Recognition Letters, vol.24, pp.1641-1650.
  3. Aggarwal, C., and Yu,P.2001. Outlier Detection for High Dimensional Data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, vol.30, pp.37-46.
  4. Jaing, M., Tseng, S., and Su, C.2001. Two-phase Clustering Process for Outlier Detection. Pattern Recognition Letters, vol.22, pp.691-700.
  5. Taloba, A. I. 2008. Data Clustering Using Evolutionary Algorithms. Master thesis, Assiut University, Assiut,Egypt.
  6. Zhang, T.,Ramakrishnan, R., and Livny, M.1997. BIRCH: A new data clustering algorithm and its applications. Data Mining and Knowledge Discovery, vol.1,pp.141-182.
  7. Ester, M.,Kriegel, H. P., Sander J., and Xu, X.1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In:2nd International Conference on Knowledge Discovery and Data Mining, pp.226-231.
  8. Guha, S.,Rastogi, R., and Shim, K.1999. A robust clustering algorithm for categorical attributes. In 15th International Conference on Data Engineering, pp.512-521.
  9. Pamula, R., Deka, J.K., Nandi, S. 2011. An Outlier Detection Method Based on Clustering. Emerging Applications of Information Technology (EAIT), pp. 253 – 256.
  10. Al-Zoubi, M., Al-Dahoud, A. and Yahya, A.A. 2010. New Outlier Detection Method Based on Fuzzy Clustering, WSEAS Transactions on Information Science and Applications, pp.681-690.
  11. Murugavel, P., and Punithavalli, M. 2011. Improved Hybrid Clustering and Distance-based Technique for Outlier Removal, International Journal on Computer Science and Engineering (IJCSE).
  12. Karmaker, A. and Rahman, S. 2009 Outlier Detection in Spatial Databases Using Clustering Data Mining, Sixth International Conference on Information Technology: New Generations, pp.1657-1658.
  13. Loureiro,A., Torgo, L. and Soares, C. 2004. Outlier Detection using Clustering Methods: a Data Cleaning Application, in Proceedings of KDNet Symposium on Knowledge-based Systems for the Public Sector. Bonn, Germany.
  14. Niu, K., Huang, C., Zhang, S., and Chen, J. 2007. ODDC: Outlier Detection Using Distance Distribution Clustering, T. Washio et al. (Eds.): PAKDD 2007 Workshops, Lecture Notes in Artificial Intelligence (LNAI) 4819, pp. 332–343.
  15. Hautamaki, V., Karkkainen, I., and Franti, P.2004. Outlier detection using knearestneighbour graph. In 17th International Conference on Pattern Recognition (ICPR 2004), Cambridge, United Kingdom, pp.430-433.
  16. Hautamaki,V.Cherednichenko, S.,Karkkainen, I.,Kinnunen, T.,and Franti, P.2005. Improving K-Means by Outlier Removal. In: SCIA 2005, pp.978-987.
  17. Virmajoki, O. 2004. Pairwise Nearest Neighbor Method Revisited. PhD thesis, University of Joensuu, Joensuu, Finland.
Index Terms

Computer Science
Information Sciences

Keywords

Outlier detection Genetic algorithms Clustering K-means algorithm Improved Genetic K-means (IGK)