CFP last date
22 April 2024
Reseach Article

Bisecting K-Means for Clustering Web Log data

by Ruchika Patil, Amreen Khan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 116 - Number 19
Year of Publication: 2015
Authors: Ruchika Patil, Amreen Khan
10.5120/20448-2799

Ruchika Patil, Amreen Khan . Bisecting K-Means for Clustering Web Log data. International Journal of Computer Applications. 116, 19 ( April 2015), 36-41. DOI=10.5120/20448-2799

@article{ 10.5120/20448-2799,
author = { Ruchika Patil, Amreen Khan },
title = { Bisecting K-Means for Clustering Web Log data },
journal = { International Journal of Computer Applications },
issue_date = { April 2015 },
volume = { 116 },
number = { 19 },
month = { April },
year = { 2015 },
issn = { 0975-8887 },
pages = { 36-41 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume116/number19/20448-2799/ },
doi = { 10.5120/20448-2799 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:57:37.923470+05:30
%A Ruchika Patil
%A Amreen Khan
%T Bisecting K-Means for Clustering Web Log data
%J International Journal of Computer Applications
%@ 0975-8887
%V 116
%N 19
%P 36-41
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Web usage mining is the area of web mining which deals with extraction of useful knowledge from web log information produced by web servers. One of the most important tasks of Web Usage Mining (WUM) is web user clustering which forms groups of users exhibiting similar interests or similar browsing patterns. This paper presents results of clustering techniques for Web log data using K-means and Bisecting K-means algorithm. Clusters are formed with respect to similar IP address and packet combinations. The clustering framework is further used as an approach for intrusion detection from the log files. The system is trained first by labeling the classes and then tested to check for any intrusions. Recommendation output is generated which help in classifying the whether the input IP's are "safe" or "infected". Comparison of both algorithms is done and performance is evaluated with respect to time and accuracy. From the experimental results, it is found that Bisecting K-means overcomes the major drawbacks of basic K-means algorithm.

References
  1. Oren Etzioni "The world wide Web: Quagmire or gold mine" Communications of the ACM, 39(11):65-68, 1996
  2. J. Srivastava, R. Cooley, M. Deshpande and P. N. Tan, "Web usage mining: discovery and applications of usage patterns from Web data", ACM SIGKDD Explorations, Volume 1 Issue 2, January 2000.
  3. Bamshad Mobasher, Chapter: 12, "Web Usage Mining in Data Collection and Pre-Processing", ACM SIGKKD 2007 Pages 450-483.
  4. K. Alsabti, S. Ranka, and V. Singh, "An Efficient k-means Clustering Algorithm", Proc. First Workshop High Performance Data Mining, Mar. 1998.
  5. JinHuaXu and HongLiu, "Web User Clustering Analysis based on K-Means Algorithm", IEEE International Conference on Information, Networking and Automation, 2010.
  6. Natheer Khasawneh and Hien-Chung Chan, "Active User-Based and Ontology-Based Weblog data preprocessing for Web Usage Mining", IEEE International Conference on Web Intelligence, 2006.
  7. Peilin Shi, "An Efficient Approach for Clustering Web Access Patterns from Web Logs", International Journal of Advanced Science and Technology, 2009
  8. K. Poongothai, M. Parimala and Dr. S. Sathiyabama," Efficient Web Usage Mining with Clustering", IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 6, No 3, November 2011.
  9. M. Steinbach, G. Karypis , V. Kumar, "A comparison of document clustering techniques", In KDD Workshop on Text Mining, 2000
  10. B. S. Vamsi Krishna, P. Satheesh, Suneel Kumar R. , "Comparative Study of K-means and Bisecting k-means Techniques in Wordnet Based Document Clustering", International Journal of Engineering and Advanced Technology, Volume-1, Issue-6, August 2012
  11. M. Jianliang, S. Haikun and B. Ling, "The Application on Intrusion Detection based on K-Means Cluster Algorithm", IEEE International Conference on Information Technology and Applications, 2009.
  12. Lei Li, De-Zhang, Fang-Cheng Shen, " A novel rule-based Intrusion Detection System using data mining", IEEE International conference on Computer Science and Information Technology, 2010.
Index Terms

Computer Science
Information Sciences

Keywords

Web mining Clustering Bisecting K-means Intrusion detection