CFP last date
20 May 2024
Reseach Article

Analysis of Clustering Techniques on Big Data

Published on March 2017 by Prachi P. Surwade, S.s.banait
Emerging Trends in Computing
Foundation of Computer Science USA
ETC2016 - Number 4
March 2017
Authors: Prachi P. Surwade, S.s.banait
c3f81ba5-f97c-48b5-82f5-f439063eb221

Prachi P. Surwade, S.s.banait . Analysis of Clustering Techniques on Big Data. Emerging Trends in Computing. ETC2016, 4 (March 2017), 28-34.

@article{
author = { Prachi P. Surwade, S.s.banait },
title = { Analysis of Clustering Techniques on Big Data },
journal = { Emerging Trends in Computing },
issue_date = { March 2017 },
volume = { ETC2016 },
number = { 4 },
month = { March },
year = { 2017 },
issn = 0975-8887,
pages = { 28-34 },
numpages = 7,
url = { /proceedings/etc2016/number4/27325-6278/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 Emerging Trends in Computing
%A Prachi P. Surwade
%A S.s.banait
%T Analysis of Clustering Techniques on Big Data
%J Emerging Trends in Computing
%@ 0975-8887
%V ETC2016
%N 4
%P 28-34
%D 2017
%I International Journal of Computer Applications
Abstract

In this In today's era data generated by scientific applications and corporate environment has grown rapidly not only in size but also in variety. This data collected is of huge amount and there is a difficulty in collecting and analyzing such big data. Data mining is the technique in which useful information and hidden relationship among data is extracted, but the traditional data mining approaches cannot be directly used for big data due to their inherent complexity. Cluster analysis is used to classify similar objects under same group. It is one of the most important data mining methods. However, it fails to perform well for big data due to huge time complexity. For such scenarios parallelization is a better approach. Map reduce is a popular programming model which enables parallel processing in a distributed environment. In this paper to propose system for analyze the performance of two clustering techniques on big dataset. The goal of this paper is to find better clustering technique between K-Medoid and BIRCH clustering by applying on real life large dataset.

References
  1. Fahad Adil, Najlaa Alshatri, Zahir Tari, Abdullah Alamri, Ibrahim Khalil, Albert Y. Zomaya, Sebti Foufou, and Abdelaziz Bouras, A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , on Emerging Topics on Computing, IEEE, 11 June 2014Ding, W. and Marchionini, G. 1997 A Study on Video Browsing Strategies. Technical Report. University of Maryland at College Park.
  2. J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, et al, "Big data: the next frontier for innovation, competition and productivity," McKinsey Global Institute, June 2011. Available: http://www. mckinsey. com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovikiation. Forman, G. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3 (Mar. 2003), 1289-1305.
  3. D. Jiang, B. C. Ooi, L. Shi, and S. Wu. The Performance of Map-Reduce: An In-depth Study. PVLDB, 3(1), 2010.
  4. Hadoop-Map-Reduce-Tutoral. http://hadoop. apache. org/common/docs/r0. 20. 2/mapred_tutorial. html
  5. T. Zhang, R. Ramakrishnan, and M. Livny, "BIRCH: An effcient data clustering method for very large databases", in Proc. ACM SIGMOD Rec. , Jun. 1996, vol. 25, no. 2, pp. 103-114.
  6. S. Guha, R. Rastogi, and K. Shim,"Cure: An effcient clustering algorithm for large databases",in Proc. ACMSIGMOD Rec. , Jun. 1998, vol. 27, no. 2, pp. 73-84.
  7. G. Karypis, E. -H. Han, and V. Kumar, "Chameleon: Hierarchical clustering using dynamic modelling", IEEE Comput. , vol. 32, no. 8, pp. 68-75, Aug. 1999.
  8. S. Guha, R. Rastogi, and K. Shim,"Rock: A robust clustering algorithm for categorical attributes", Inform. Syst. , vol. 25, no. 5, pp. 345-366, 2000.
  9. M. Dutta, A. Kakoti Mahanta and A. K. Pujari, QROCK: A quick version of the ROCK algorithm for clustering of categorical data, Pattern Recognition Letters, 26 (2005), 2364-2373.
  10. R. T. Ng and J. Han, "CLARANS: A method for clustering objects for spatial data mining", IEEE Trans. Knowl. Data Eng. (TKDE), vol. 14, no. 5, pp. 1003-1016, Sep. /Oct. 2002.
  11. ALSABTI K. , RANKA S. , SINGH V. , "An Efficient k-means Clus- tering Algorithm", Proc. First Workshop High Performance Data Mining, 1998.
  12. CHU S. C. , RODDICK J. F. , CHEN T. Y. , PAN J. S. , Efficient search approaches for k-medoids-based algorithms, TENCON 02,Proceed- ings, 2002 IEEE Region 10 Conference on Computers, Communi- cations, Control and Power Engineering, 2002.
  13. Hae-Sang Park, Chy Hyuck Jun,A Simple And Fast Algorithm For K-Medoid clustering, Department of Industrial and Manage- ment Engineering, POSTECH, San 31 Hyoja-dong, Pohang 790-784, South Korea.
Index Terms

Computer Science
Information Sciences

Keywords

Big Data Clustering Hadoop Map-rreduce K-medoid(km) Birch.