Analysis of Clustering Techniques on Big Data

Call for Paper

June Edition

IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2025

Submit your paper

Know more

The week's pick

Attack information gathering from network analysis data during scanning activity

Stephane J. Tamafo Elie Fute Tagne Jaime C. Acosta Charles Kamhoua Rawat Danda

Random Articles

IOT Perception Layer Security and Privacy

Apr

2019

Implementation of Artificial Creativity: Redefining Creativity

August

2011

Recommender Systems for Software Requirements Negotiation and Prioritization

May

2015

A Performance Study of Proactive, Reactive and Hybrid Routing Protocols using Qualnet Simulator

August

2011

Reseach Article

Analysis of Clustering Techniques on Big Data

Published on March 2017 by Prachi P. Surwade, S.s.banait

Emerging Trends in Computing

Foundation of Computer Science USA

ETC2016 - Number 4

March 2017

Authors: Prachi P. Surwade, S.s.banait

Prachi P. Surwade, S.s.banait . Analysis of Clustering Techniques on Big Data. Emerging Trends in Computing. ETC2016, 4 (March 2017), 28-34.

@article{

author = { Prachi P. Surwade, S.s.banait },

title = { Analysis of Clustering Techniques on Big Data },

journal = { Emerging Trends in Computing },

issue_date = { March 2017 },

volume = { ETC2016 },

number = { 4 },

month = { March },

year = { 2017 },

issn = 0975-8887,

pages = { 28-34 },

numpages = 7,

url = { /proceedings/etc2016/number4/27325-6278/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 Emerging Trends in Computing

%A Prachi P. Surwade

%A S.s.banait

%T Analysis of Clustering Techniques on Big Data

%J Emerging Trends in Computing

%@ 0975-8887

%V ETC2016

%N 4

%P 28-34

%D 2017

%I International Journal of Computer Applications

Abstract

In this In today's era data generated by scientific applications and corporate environment has grown rapidly not only in size but also in variety. This data collected is of huge amount and there is a difficulty in collecting and analyzing such big data. Data mining is the technique in which useful information and hidden relationship among data is extracted, but the traditional data mining approaches cannot be directly used for big data due to their inherent complexity. Cluster analysis is used to classify similar objects under same group. It is one of the most important data mining methods. However, it fails to perform well for big data due to huge time complexity. For such scenarios parallelization is a better approach. Map reduce is a popular programming model which enables parallel processing in a distributed environment. In this paper to propose system for analyze the performance of two clustering techniques on big dataset. The goal of this paper is to find better clustering technique between K-Medoid and BIRCH clustering by applying on real life large dataset.

References

Fahad Adil, Najlaa Alshatri, Zahir Tari, Abdullah Alamri, Ibrahim Khalil, Albert Y. Zomaya, Sebti Foufou, and Abdelaziz Bouras, A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , on Emerging Topics on Computing, IEEE, 11 June 2014Ding, W. and Marchionini, G. 1997 A Study on Video Browsing Strategies. Technical Report. University of Maryland at College Park.
J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, et al, "Big data: the next frontier for innovation, competition and productivity," McKinsey Global Institute, June 2011. Available: http://www. mckinsey. com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovikiation. Forman, G. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3 (Mar. 2003), 1289-1305.
D. Jiang, B. C. Ooi, L. Shi, and S. Wu. The Performance of Map-Reduce: An In-depth Study. PVLDB, 3(1), 2010.
Hadoop-Map-Reduce-Tutoral. http://hadoop. apache. org/common/docs/r0. 20. 2/mapred_tutorial. html
T. Zhang, R. Ramakrishnan, and M. Livny, "BIRCH: An effcient data clustering method for very large databases", in Proc. ACM SIGMOD Rec. , Jun. 1996, vol. 25, no. 2, pp. 103-114.
S. Guha, R. Rastogi, and K. Shim,"Cure: An effcient clustering algorithm for large databases",in Proc. ACMSIGMOD Rec. , Jun. 1998, vol. 27, no. 2, pp. 73-84.
G. Karypis, E. -H. Han, and V. Kumar, "Chameleon: Hierarchical clustering using dynamic modelling", IEEE Comput. , vol. 32, no. 8, pp. 68-75, Aug. 1999.
S. Guha, R. Rastogi, and K. Shim,"Rock: A robust clustering algorithm for categorical attributes", Inform. Syst. , vol. 25, no. 5, pp. 345-366, 2000.
M. Dutta, A. Kakoti Mahanta and A. K. Pujari, QROCK: A quick version of the ROCK algorithm for clustering of categorical data, Pattern Recognition Letters, 26 (2005), 2364-2373.
R. T. Ng and J. Han, "CLARANS: A method for clustering objects for spatial data mining", IEEE Trans. Knowl. Data Eng. (TKDE), vol. 14, no. 5, pp. 1003-1016, Sep. /Oct. 2002.
ALSABTI K. , RANKA S. , SINGH V. , "An Efficient k-means Clus- tering Algorithm", Proc. First Workshop High Performance Data Mining, 1998.
CHU S. C. , RODDICK J. F. , CHEN T. Y. , PAN J. S. , Efficient search approaches for k-medoids-based algorithms, TENCON 02,Proceed- ings, 2002 IEEE Region 10 Conference on Computers, Communi- cations, Control and Power Engineering, 2002.
Hae-Sang Park, Chy Hyuck Jun,A Simple And Fast Algorithm For K-Medoid clustering, Department of Industrial and Manage- ment Engineering, POSTECH, San 31 Hyoja-dong, Pohang 790-784, South Korea.

Index Terms

Computer Science

Information Sciences

Keywords

Big Data Clustering Hadoop Map-rreduce K-medoid(km) Birch.