Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

Bisecting K-Means for Clustering Web Log data

Print
PDF
International Journal of Computer Applications
© 2015 by IJCA Journal
Volume 116 - Number 19
Year of Publication: 2015
Authors:
Ruchika Patil
Amreen Khan
10.5120/20448-2799

Ruchika Patil and Amreen Khan. Article: Bisecting K-Means for Clustering Web Log data. International Journal of Computer Applications 116(19):36-41, April 2015. Full text available. BibTeX

@article{key:article,
	author = {Ruchika Patil and Amreen Khan},
	title = {Article: Bisecting K-Means for Clustering Web Log data},
	journal = {International Journal of Computer Applications},
	year = {2015},
	volume = {116},
	number = {19},
	pages = {36-41},
	month = {April},
	note = {Full text available}
}

Abstract

Web usage mining is the area of web mining which deals with extraction of useful knowledge from web log information produced by web servers. One of the most important tasks of Web Usage Mining (WUM) is web user clustering which forms groups of users exhibiting similar interests or similar browsing patterns. This paper presents results of clustering techniques for Web log data using K-means and Bisecting K-means algorithm. Clusters are formed with respect to similar IP address and packet combinations. The clustering framework is further used as an approach for intrusion detection from the log files. The system is trained first by labeling the classes and then tested to check for any intrusions. Recommendation output is generated which help in classifying the whether the input IP's are "safe" or "infected". Comparison of both algorithms is done and performance is evaluated with respect to time and accuracy. From the experimental results, it is found that Bisecting K-means overcomes the major drawbacks of basic K-means algorithm.

References

  • Oren Etzioni "The world wide Web: Quagmire or gold mine" Communications of the ACM, 39(11):65-68, 1996
  • J. Srivastava, R. Cooley, M. Deshpande and P. N. Tan, "Web usage mining: discovery and applications of usage patterns from Web data", ACM SIGKDD Explorations, Volume 1 Issue 2, January 2000.
  • Bamshad Mobasher, Chapter: 12, "Web Usage Mining in Data Collection and Pre-Processing", ACM SIGKKD 2007 Pages 450-483.
  • K. Alsabti, S. Ranka, and V. Singh, "An Efficient k-means Clustering Algorithm", Proc. First Workshop High Performance Data Mining, Mar. 1998.
  • JinHuaXu and HongLiu, "Web User Clustering Analysis based on K-Means Algorithm", IEEE International Conference on Information, Networking and Automation, 2010.
  • Natheer Khasawneh and Hien-Chung Chan, "Active User-Based and Ontology-Based Weblog data preprocessing for Web Usage Mining", IEEE International Conference on Web Intelligence, 2006.
  • Peilin Shi, "An Efficient Approach for Clustering Web Access Patterns from Web Logs", International Journal of Advanced Science and Technology, 2009
  • K. Poongothai, M. Parimala and Dr. S. Sathiyabama," Efficient Web Usage Mining with Clustering", IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 6, No 3, November 2011.
  • M. Steinbach, G. Karypis , V. Kumar, "A comparison of document clustering techniques", In KDD Workshop on Text Mining, 2000
  • B. S. Vamsi Krishna, P. Satheesh, Suneel Kumar R. , "Comparative Study of K-means and Bisecting k-means Techniques in Wordnet Based Document Clustering", International Journal of Engineering and Advanced Technology, Volume-1, Issue-6, August 2012
  • M. Jianliang, S. Haikun and B. Ling, "The Application on Intrusion Detection based on K-Means Cluster Algorithm", IEEE International Conference on Information Technology and Applications, 2009.
  • Lei Li, De-Zhang, Fang-Cheng Shen, " A novel rule-based Intrusion Detection System using data mining", IEEE International conference on Computer Science and Information Technology, 2010.