CFP last date
20 June 2024
Reseach Article

Preprocessing Techniques in Web Usage Mining: A Survey

by Mitali Srivastava, Rakhi Garg, P. K. Mishra
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 97 - Number 18
Year of Publication: 2014
Authors: Mitali Srivastava, Rakhi Garg, P. K. Mishra

Mitali Srivastava, Rakhi Garg, P. K. Mishra . Preprocessing Techniques in Web Usage Mining: A Survey. International Journal of Computer Applications. 97, 18 ( July 2014), 1-9. DOI=10.5120/17104-7737

@article{ 10.5120/17104-7737,
author = { Mitali Srivastava, Rakhi Garg, P. K. Mishra },
title = { Preprocessing Techniques in Web Usage Mining: A Survey },
journal = { International Journal of Computer Applications },
issue_date = { July 2014 },
volume = { 97 },
number = { 18 },
month = { July },
year = { 2014 },
issn = { 0975-8887 },
pages = { 1-9 },
numpages = {9},
url = { },
doi = { 10.5120/17104-7737 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2024-02-06T22:24:26.026256+05:30
%A Mitali Srivastava
%A Rakhi Garg
%A P. K. Mishra
%T Preprocessing Techniques in Web Usage Mining: A Survey
%J International Journal of Computer Applications
%@ 0975-8887
%V 97
%N 18
%P 1-9
%D 2014
%I Foundation of Computer Science (FCS), NY, USA

Due to huge, unstructured and scattered amount of data available on web, it is very tough for users to get relevant information in less time. To achieve this, improvement in design of web site, personalization of contents, prefetching and caching activities are done according to user's behavior analysis. User's activities can be captured into a special file called log file. There are various types of log: Server log, Proxy server log, Client/Browser log. These log files are used by web usage mining to analyze and discover useful patterns. The process of web usage mining involves three interdependent steps: Data preprocessing, Pattern discovery and Pattern analysis. Among these steps, Data preprocessing plays a vital role because of unstructured, redundant and noisy nature of log data. To improve later phases of web usage mining like Pattern discovery and Pattern analysis several data preprocessing techniques such as Data Cleaning, User Identification, Session Identification, Path Completion etc. have been used. In this paper all these techniques are discussed in detail. Moreover these techniques are also categorized and incorporated with their advantage and disadvantage that will help scientist, researchers and academicians working in this direction.

  1. R. Kosala, H Blockeel (2000), Web Mining Research: A Survey in ACM SIGKDD Explorations, Vol. 2 Issues 1, Page(s):1-15.
  2. B. Singh, H. K. Singh (2010), Web Data Mining Research: A Survey in Computational Intelligence and Computing Research (ICCIC), IEEE International Conference, Page(s): 1-10.
  3. Chintan R. Varnagar, Nirali N. Madhak, Trupti M. Kodinariya (2013). Web Usage Mining: A Review on Process Methods and Techniques in Information Communication and Embedded Systems (ICICES), IEEE International Conference, Page(s): 40 – 46.
  4. Qingyu Zhang, Richard Segall (2008), Web mining: a survey of current research, techniques and software in International Journal of Information Technology & Decision Making Vol. 7, No. 4 Page(s) 961-965.
  5. R. Cooley, B. Mobasher, J. Srivastava (1997), Web mining: information and pattern discovery on World Wide web in Tools with artificial intelligence Ninth IEEE International Conference, Page(s): 558-567.
  6. Jaideep Srivastava, Robert Cooley, Mukund Deshpande, Pang-Ning Tan (2000), Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data, Vol. 1 Page(s): 12-23.
  7. R. Cooley, B. Mobasher, J. Srivastav (1999), Data preparation for mining world wide web browsing pattern in Journal of Knowledge and Data Engineering Workshop, IEEE, Vol. 1 Page(s): 5-32.
  8. Zidrina Pabarskaite, Aistis Raudys (2007), A process of knowledge discovery from web usage data: Systemization and critical review in Journal of Intelligent Information System, Springer Vol. 28 Issue. 1 Page(s): 79-104.
  9. F. Facca, P. Lanzi (2005), Mining interesting knowledge from weblogs: a survey in Data and Knowledge Engineering, Vol. 53 Issue 3, Page(s): 225–241.
  10. Yuan, F. , L. -J. Wang, et al. (2003), Study on Data Preprocessing Algorithm in Web Log Mining in Proceedings of the Second International Conference on Machine Learning and Cybernetics, Vol. 1 Page(s): 28-32.
  11. Bing Liu,Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, 2nd ed. 2011
  12. Tasawar Hussain (2007), Hierarchical sessionization at preprocessing level of WUM based on swarm intelligence in 6th International Conference on Emerging Technologies, IEEE Page(s): 21-26.
  13. C. Shahabi, F. Banaei-Kashani (2002), A Framework for Efficient and Anonymous Web Usage Mining Based on Client-Side Tracking in WEBKDD Third International Workshop on Mining Web Log Data, Page(s): 113-144.
  14. D. Pierrakos, G. Paliouras, C. Papatheodorou, and C. D. Spyropoulos (2003), Web usage mining as a tool for personalization: A survey in User Modeling and User Adapted Interaction journal, Vol. 13 Issues. 4 Page(s): 311-372.
  15. Pabarskaite Z (2002), Implementing advanced cleaning and end-user interpretability technologies in web log mining in 24th International Conference on Information Technology Interfaces (ITI), Vol. 1 Page(s): 109-113.
  16. P. -N. Tan, V. Kumar (2000) Modeling of web robot navigational patterns, in: WEBKDD Web Mining for Ecommerce Challenges and Opportunities, Second International Workshop.
  17. Berendt, B. spiliopoulou M (2000), Analyzing navigation behavior in Web sites integrating multiple information systems in VLDB Journal, Special Issue on Databases and the Web, Vol. 9 Page(s): 56-75.
  18. Pabarskaite Z (2003), Decision trees for web log mining in Intelligent Data Analysis Journal, Vol. 7 Issue. 2 Page(s): 141–155.
  19. Renata Ivancsy, and Sandor Juhasz (2007), Analysis of Web User Identification Methods in World Academy of Science Engineering and Technology, Vol. 34, 2007.
  20. Berendt, B. , Mobasher, B. , Nakagawa, M. , & Spiliopoulou, M. (2002). The impact of site structure and user environment on session reconstruction in web usage analysis in 4th WebKDD Workshop on Knowledge Discovery in Databases Edmonton.
  21. M. Chen, A. S. LaPaugh, J. P. Singh (2002), Predicting category accesses for a user in a structured information space in Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, Page(s): 65–72.
  22. J. Zhang, Ali A. Ghorbani (2004), The Reconstruction of user session from a server log using improved time oriented heuristic in 2nd Annual Conference on Communication Networks and Service Research IEEE, Page(s): 315-322.
  23. Yan LI (2008), Research on path completion technique in web usage mining in International Symposium on Computer Science and Computational Technology, IEEE, Vol. 1 Page(s): 554-559.
  24. D. Tanasa, B. Trousse (2004), Advanced Data Preprocessing for Intersites Web Usage Mining in IEEE Intelligent Systems, Vol. 19 Issues. 2 Page(s): 59-65.
  25. G. Castellano, A. Fanelli, M. Torsello, LODAP: A Log Data Preprocessor for Mining Web Browsing Patterns in Proceedings of the 6th Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, Page(s): 12–17.
  26. R. F. Dell (2008),Web user session reconstruction using integer programming in International Conference on Web Intelligence and Intelligent Agent Technology, IEEE/ACM/WIC, Vol. 1 Page(s): 385-388.
  27. Atul Kumar Srivastava, Mitali Srivastava, Rakhi Garg, P. K. Mishra (2014), Comparative Study of Web Page Ranking Algorithms in IJETCAS, ISSN (Print): 2279-0047, ISSN (On-line): 2279-0055, Issue 7, Vol. 1 Page(s): 26-32.
  28. Wahab, M. H. A. , M. N. H. Mohd, et al. (2008), Data Preprocessing on Web Server Logs for Generalized Association Rules Mining Algorithm. World Academy of Science, Engineering and Technology.
  29. P. E. Román, G. L'Huillier, J. D. Velásquez (2010), Web usage mining advanced Techniques in Web Intelligence, Springer (2010), Page(s): 143–165.
  30. Xiang-ying Li (2013), Data Preprocessing in Web Usage Mining in 19th International Conference on Industrial Engineering and Engineering Management Page(s): 257-266.
  31. Web Expert Lite Tool version 8. 4, www. weblogexpert. com.
Index Terms

Computer Science
Information Sciences


Data mining Web mining Web usage mining Data preprocessing.