CFP last date
20 May 2024
Reseach Article

Web mining of log files using Hadoop MapReduce

Published on April 2012 by Janu Oswal, Poorvi Jain, Rupali Phanase, Shweta Parjane
Emerging Trends in Computer Science and Information Technology (ETCSIT2012)
Foundation of Computer Science USA
ETCSIT - Number 4
April 2012
Authors: Janu Oswal, Poorvi Jain, Rupali Phanase, Shweta Parjane
65d81be4-b312-4e35-8f89-a660a6d0c460

Janu Oswal, Poorvi Jain, Rupali Phanase, Shweta Parjane . Web mining of log files using Hadoop MapReduce. Emerging Trends in Computer Science and Information Technology (ETCSIT2012). ETCSIT, 4 (April 2012), 39-45.

@article{
author = { Janu Oswal, Poorvi Jain, Rupali Phanase, Shweta Parjane },
title = { Web mining of log files using Hadoop MapReduce },
journal = { Emerging Trends in Computer Science and Information Technology (ETCSIT2012) },
issue_date = { April 2012 },
volume = { ETCSIT },
number = { 4 },
month = { April },
year = { 2012 },
issn = 0975-8887,
pages = { 39-45 },
numpages = 7,
url = { /proceedings/etcsit/number4/5990-1042/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 Emerging Trends in Computer Science and Information Technology (ETCSIT2012)
%A Janu Oswal
%A Poorvi Jain
%A Rupali Phanase
%A Shweta Parjane
%T Web mining of log files using Hadoop MapReduce
%J Emerging Trends in Computer Science and Information Technology (ETCSIT2012)
%@ 0975-8887
%V ETCSIT
%N 4
%P 39-45
%D 2012
%I International Journal of Computer Applications
Abstract

Virtual Database Technology (VDB) is one of the effective solutions for integration of data from heterogeneous sources. This will become complex when size of the database is very large. MapReduce is a new framework specifically designed for processing huge datasets on distributed sources. Apache'sHadoop is an implementation of MapReduce. This pape r poposes to utilize the parallel and distributed processing capability, the virutal servers response to the region wise query. the output will show the graph of oracle space required and the Hadoop space required for the project with the reduced data displayed in the textbox.

References
  1. Wenhao Xu, Jing Li, Yongwei Wu, Xiaomeng Huang, Guangwen Yang, VDM: Virtual Database Management for Distributed and File System, Grid and Cooperative Computing (2008), IEEE.
  2. Yuji Wada, Yuta Watanabe, Keisuke Syoubu, Jun Sawamoto, Takashi Katoh. Virtual Database Technology for Distributed Database, 2010 IEEE 24th, International Conference on Advanced Information Networking and Applications Workshop.
  3. Ferreira. R,Mouraires,J. ,Martins,R. ,Pntoquilho. M. , XML based Metadata Repository for Information Systems, IEEE Artificial intelligence conference, 2005.
  4. Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Google Research Publication (2004)
  5. Simplified Data Processing on Large Clusters. Google Research Publication (2004).
  6. Ralf Lammel. Google's MapReduce Programming Model Revisited. Science of Computer Programming archive. Volume 68, (2008).
  7. Apachee Hadoop, http://Hadoop. apache. org.
  8. Tom White. Hadoop: The Definitive Guide. O'Reilly, Scbastopol, California, 2009.
  9. Gang Chen, Yongwei Wu, Jia Liu, Guangwen Yang and Weimin Zheng. Optimization of subquery processing in distributed data integration systems. Journal of Network and Computer Applications (2010).
Index Terms

Computer Science
Information Sciences

Keywords

Virtual Server Web Mining query Optimization Mapreduce Hadoop.