CFP last date
Reseach Article

Preprocessing Techniques in Web Usage Mining: A Survey

by Mitali Srivastava, Rakhi Garg, P. K. Mishra
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 97 - Number 18
Year of Publication: 2014
Authors: Mitali Srivastava, Rakhi Garg, P. K. Mishra

Mitali Srivastava, Rakhi Garg, P. K. Mishra . Preprocessing Techniques in Web Usage Mining: A Survey. International Journal of Computer Applications. 97, 18 ( July 2014), 1-9. DOI=10.5120/17104-7737

Due to huge, unstructured and scattered amount of data available on web, it is very tough for users to get relevant information in less time. To achieve this, improvement in design of web site, personalization of contents, prefetching and caching activities are done according to user's behavior analysis. User's activities can be captured into a special file called log file. There are various types of log: Server log, Proxy server log, Client/Browser log. These log files are used by web usage mining to analyze and discover useful patterns. The process of web usage mining involves three interdependent steps: Data preprocessing, Pattern discovery and Pattern analysis. Among these steps, Data preprocessing plays a vital role because of unstructured, redundant and noisy nature of log data. To improve later phases of web usage mining like Pattern discovery and Pattern analysis several data preprocessing techniques such as Data Cleaning, User Identification, Session Identification, Path Completion etc. have been used. In this paper all these techniques are discussed in detail. Moreover these techniques are also categorized and incorporated with their advantage and disadvantage that will help scientist, researchers and academicians working in this direction.

Index Terms

Computer Science
Information Sciences


Data mining Web mining Web usage mining Data preprocessing.