CFP last date
20 May 2024
Reseach Article

Use of Mapreduce for Data Mining and Data Optimization on a Web Portal

by Christopher A. Moturi, Silas K. Maiyo
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 56 - Number 7
Year of Publication: 2012
Authors: Christopher A. Moturi, Silas K. Maiyo
10.5120/8906-2945

Christopher A. Moturi, Silas K. Maiyo . Use of Mapreduce for Data Mining and Data Optimization on a Web Portal. International Journal of Computer Applications. 56, 7 ( October 2012), 39-43. DOI=10.5120/8906-2945

@article{ 10.5120/8906-2945,
author = { Christopher A. Moturi, Silas K. Maiyo },
title = { Use of Mapreduce for Data Mining and Data Optimization on a Web Portal },
journal = { International Journal of Computer Applications },
issue_date = { October 2012 },
volume = { 56 },
number = { 7 },
month = { October },
year = { 2012 },
issn = { 0975-8887 },
pages = { 39-43 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume56/number7/8906-2945/ },
doi = { 10.5120/8906-2945 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:58:15.132762+05:30
%A Christopher A. Moturi
%A Silas K. Maiyo
%T Use of Mapreduce for Data Mining and Data Optimization on a Web Portal
%J International Journal of Computer Applications
%@ 0975-8887
%V 56
%N 7
%P 39-43
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper studied the design, implementation and evaluation of a MapReduce tool targeting distributed systems, and multi-core system architectures. MapReduce is a distributed programming model originally proposed by Google for the ease of development of web search applications on a large number of clusters of computers. We addressed the issues of limited resource for data optimization for efficiency, reliability, scalability and security of data in distributed, cluster systems with huge datasets. The study's experimental results predicted that the MapReduce tool developed improved data optimization. The system exhibits undesired speedup with smaller datasets, but reasonable speedup is achieved with a larger enough datasets that complements the number of computing nodes reducing the execution time by 30% as compared to normal data mining and processing. The MapReduce tool is able to handle data growth trendily, especially with larger number of computing nodes. Scaleup gracefully grows as data and number of computing nodes increases. Security of data is guaranteed at all computing nodes since data is replicated at various nodes on the cluster system hence reliable. Our implementation of the MapReduce runs on distributed cluster computing environment of a national education web portal and is highly scalable.

References
  1. Apache Hadoop. [Online] [Cited: 07 05, 2011. ] http://hadoop. apache. org/
  2. Dean J. and Ghemawat S. 2004. "Mapreduce: Simplified Data Processing On Large Clusters," In Proceedings of OSDI'04: 6th Symposium on Operating System Design and Implementation.
  3. Dean J. and Ghemawat S. 2008. "MapReduce: Simplified Data Processing on Large Clusters". Communications of the ACM. Vol. 51, 1, pp. 107-113.
  4. Ghemawat S. , Gobioff H. , and Leung S. T. 2003. "The Google File System". Proceedings of 19th ACM Symposium on Operating Systems Principles, pp 29-43
  5. Google and IBM Announce University Initiative to Address Internet-Scale Computing Challenges. Google Press Center. [Online] 10 08, 2007. [Cited: 07 05, 2011. ]
  6. He, B. , Fang, W. , Luo, Q. , Govindaraju, N. K. , Wang, T. 2008. "Mars: A MapReduce Framework on Graphics Processors". Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 260-269.
  7. Pavlo, A et al. 2009. "A Comparison of Approaches to Large-Scale Data Analysis". Proceedings of the 35th SIGMOD International Conference on Management of Data, pp. 165-178
  8. Rafique, Mustafa. M. 2009. "Supporting MapReduce on Large-Scale Asymmetric Multi-Core Clusters". ACM SIGOPS Operating Systems Review, Vol. 43, 2, pp. 25-34.
  9. Ranger, C. , Raghuraman, R. , Penmetsa, A. , Bradski, G. , Kozyrakis, C. 2007. "Evaluating MapReduce for Multi-core and Multiprocessor Systems". Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pp. 13-24.
  10. Yoo, R. M. , Romano, A. K. and Kozyrakis, C. 2009. Phoenix Rebirth: "Scalable MapReduce on a Large-Scale Shared-Memory System". Proceedings of the 2009 IEEE International Symposium on Workload Characterization, pp. 198-207.
Index Terms

Computer Science
Information Sciences

Keywords

MapReduce Hadoop Scalability