Bisecting K-Means for Clustering Web Log data

Ruchika Patil; Amreen Khan

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 22 April 2024

Submit your paper

Know more

The week's pick

Enhancing Privacy Preservation: Multi-Attribute Protection with P-Sensitive K-Anonymity

Twinkle Patel Kiran Amin

Random Articles

Analysis of Randomized Performance of Bias Parameters and Activation Function of Extreme Learning Machine

February

2016

Hardware Implementation of FFT using Vertically and Crosswise Algorithm

December

2011

Security and Privacy of Image by Encryption, Lossy Compression and Iterative Reconstruction

January

2013

Empirical Characterization of Propagation Path Loss and Performance Evaluation for Co-Site Urban Environment

May

2013

Reseach Article

Bisecting K-Means for Clustering Web Log data

by Ruchika Patil, Amreen Khan

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 116 - Number 19

Year of Publication: 2015

Authors: Ruchika Patil, Amreen Khan

10.5120/20448-2799

Ruchika Patil, Amreen Khan . Bisecting K-Means for Clustering Web Log data. International Journal of Computer Applications. 116, 19 ( April 2015), 36-41. DOI=10.5120/20448-2799

@article{ 10.5120/20448-2799,

author = { Ruchika Patil, Amreen Khan },

title = { Bisecting K-Means for Clustering Web Log data },

journal = { International Journal of Computer Applications },

issue_date = { April 2015 },

volume = { 116 },

number = { 19 },

month = { April },

year = { 2015 },

issn = { 0975-8887 },

pages = { 36-41 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume116/number19/20448-2799/ },

doi = { 10.5120/20448-2799 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:57:37.923470+05:30

%A Ruchika Patil

%A Amreen Khan

%T Bisecting K-Means for Clustering Web Log data

%J International Journal of Computer Applications

%@ 0975-8887

%V 116

%N 19

%P 36-41

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Web usage mining is the area of web mining which deals with extraction of useful knowledge from web log information produced by web servers. One of the most important tasks of Web Usage Mining (WUM) is web user clustering which forms groups of users exhibiting similar interests or similar browsing patterns. This paper presents results of clustering techniques for Web log data using K-means and Bisecting K-means algorithm. Clusters are formed with respect to similar IP address and packet combinations. The clustering framework is further used as an approach for intrusion detection from the log files. The system is trained first by labeling the classes and then tested to check for any intrusions. Recommendation output is generated which help in classifying the whether the input IP's are "safe" or "infected". Comparison of both algorithms is done and performance is evaluated with respect to time and accuracy. From the experimental results, it is found that Bisecting K-means overcomes the major drawbacks of basic K-means algorithm.

References

Oren Etzioni "The world wide Web: Quagmire or gold mine" Communications of the ACM, 39(11):65-68, 1996
J. Srivastava, R. Cooley, M. Deshpande and P. N. Tan, "Web usage mining: discovery and applications of usage patterns from Web data", ACM SIGKDD Explorations, Volume 1 Issue 2, January 2000.
Bamshad Mobasher, Chapter: 12, "Web Usage Mining in Data Collection and Pre-Processing", ACM SIGKKD 2007 Pages 450-483.
K. Alsabti, S. Ranka, and V. Singh, "An Efficient k-means Clustering Algorithm", Proc. First Workshop High Performance Data Mining, Mar. 1998.
JinHuaXu and HongLiu, "Web User Clustering Analysis based on K-Means Algorithm", IEEE International Conference on Information, Networking and Automation, 2010.
Natheer Khasawneh and Hien-Chung Chan, "Active User-Based and Ontology-Based Weblog data preprocessing for Web Usage Mining", IEEE International Conference on Web Intelligence, 2006.
Peilin Shi, "An Efficient Approach for Clustering Web Access Patterns from Web Logs", International Journal of Advanced Science and Technology, 2009
K. Poongothai, M. Parimala and Dr. S. Sathiyabama," Efficient Web Usage Mining with Clustering", IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 6, No 3, November 2011.
M. Steinbach, G. Karypis , V. Kumar, "A comparison of document clustering techniques", In KDD Workshop on Text Mining, 2000
B. S. Vamsi Krishna, P. Satheesh, Suneel Kumar R. , "Comparative Study of K-means and Bisecting k-means Techniques in Wordnet Based Document Clustering", International Journal of Engineering and Advanced Technology, Volume-1, Issue-6, August 2012
M. Jianliang, S. Haikun and B. Ling, "The Application on Intrusion Detection based on K-Means Cluster Algorithm", IEEE International Conference on Information Technology and Applications, 2009.
Lei Li, De-Zhang, Fang-Cheng Shen, " A novel rule-based Intrusion Detection System using data mining", IEEE International conference on Computer Science and Information Technology, 2010.

Index Terms

Computer Science

Information Sciences

Keywords

Web mining Clustering Bisecting K-means Intrusion detection