Use of Mapreduce for Data Mining and Data Optimization on a Web Portal

Christopher A. Moturi; Silas K. Maiyo

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 20 June 2025

Submit your paper

Know more

The week's pick

Designing Multi-Tenant E-Learning Systems in the Cloud: A Process-Oriented Approach for Higher Education

Sameh Azouzi Sonia Ayachi Ghannouchi

Random Articles

Prediction of Breast Cancer Risk Level with Risk Factors in Perspective to Bangladeshi Women using Data Mining

November

2013

Clone Attack Detection Protocols in Wireless Sensor Networks: A Survey

July

2014

An Efficient Gateway Election Algorithm for Clusters in MANET

September

2014

Security Attacks in Mobile Adhoc Networks (MANET): A Literature Survey

July

2015

Reseach Article

Use of Mapreduce for Data Mining and Data Optimization on a Web Portal

by Christopher A. Moturi, Silas K. Maiyo

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 56 - Number 7

Year of Publication: 2012

Authors: Christopher A. Moturi, Silas K. Maiyo

10.5120/8906-2945

Christopher A. Moturi, Silas K. Maiyo . Use of Mapreduce for Data Mining and Data Optimization on a Web Portal. International Journal of Computer Applications. 56, 7 ( October 2012), 39-43. DOI=10.5120/8906-2945

@article{ 10.5120/8906-2945,

author = { Christopher A. Moturi, Silas K. Maiyo },

title = { Use of Mapreduce for Data Mining and Data Optimization on a Web Portal },

journal = { International Journal of Computer Applications },

issue_date = { October 2012 },

volume = { 56 },

number = { 7 },

month = { October },

year = { 2012 },

issn = { 0975-8887 },

pages = { 39-43 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume56/number7/8906-2945/ },

doi = { 10.5120/8906-2945 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:58:15.132762+05:30

%A Christopher A. Moturi

%A Silas K. Maiyo

%T Use of Mapreduce for Data Mining and Data Optimization on a Web Portal

%J International Journal of Computer Applications

%@ 0975-8887

%V 56

%N 7

%P 39-43

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

This paper studied the design, implementation and evaluation of a MapReduce tool targeting distributed systems, and multi-core system architectures. MapReduce is a distributed programming model originally proposed by Google for the ease of development of web search applications on a large number of clusters of computers. We addressed the issues of limited resource for data optimization for efficiency, reliability, scalability and security of data in distributed, cluster systems with huge datasets. The study's experimental results predicted that the MapReduce tool developed improved data optimization. The system exhibits undesired speedup with smaller datasets, but reasonable speedup is achieved with a larger enough datasets that complements the number of computing nodes reducing the execution time by 30% as compared to normal data mining and processing. The MapReduce tool is able to handle data growth trendily, especially with larger number of computing nodes. Scaleup gracefully grows as data and number of computing nodes increases. Security of data is guaranteed at all computing nodes since data is replicated at various nodes on the cluster system hence reliable. Our implementation of the MapReduce runs on distributed cluster computing environment of a national education web portal and is highly scalable.

References

Apache Hadoop. [Online] [Cited: 07 05, 2011. ] http://hadoop. apache. org/
Dean J. and Ghemawat S. 2004. "Mapreduce: Simplified Data Processing On Large Clusters," In Proceedings of OSDI'04: 6th Symposium on Operating System Design and Implementation.
Dean J. and Ghemawat S. 2008. "MapReduce: Simplified Data Processing on Large Clusters". Communications of the ACM. Vol. 51, 1, pp. 107-113.
Ghemawat S. , Gobioff H. , and Leung S. T. 2003. "The Google File System". Proceedings of 19th ACM Symposium on Operating Systems Principles, pp 29-43
Google and IBM Announce University Initiative to Address Internet-Scale Computing Challenges. Google Press Center. [Online] 10 08, 2007. [Cited: 07 05, 2011. ]
He, B. , Fang, W. , Luo, Q. , Govindaraju, N. K. , Wang, T. 2008. "Mars: A MapReduce Framework on Graphics Processors". Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 260-269.
Pavlo, A et al. 2009. "A Comparison of Approaches to Large-Scale Data Analysis". Proceedings of the 35th SIGMOD International Conference on Management of Data, pp. 165-178
Rafique, Mustafa. M. 2009. "Supporting MapReduce on Large-Scale Asymmetric Multi-Core Clusters". ACM SIGOPS Operating Systems Review, Vol. 43, 2, pp. 25-34.
Ranger, C. , Raghuraman, R. , Penmetsa, A. , Bradski, G. , Kozyrakis, C. 2007. "Evaluating MapReduce for Multi-core and Multiprocessor Systems". Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pp. 13-24.
Yoo, R. M. , Romano, A. K. and Kozyrakis, C. 2009. Phoenix Rebirth: "Scalable MapReduce on a Large-Scale Shared-Memory System". Proceedings of the 2009 IEEE International Symposium on Workload Characterization, pp. 198-207.

Index Terms

Computer Science

Information Sciences

Keywords

MapReduce Hadoop Scalability