CFP last date
20 May 2024
Reseach Article

Benchmarking Raspberry Pi 2 Hadoop Cluster

by Dimitrios Papakyriakou
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 178 - Number 42
Year of Publication: 2019
Authors: Dimitrios Papakyriakou
10.5120/ijca2019919328

Dimitrios Papakyriakou . Benchmarking Raspberry Pi 2 Hadoop Cluster. International Journal of Computer Applications. 178, 42 ( Aug 2019), 37-47. DOI=10.5120/ijca2019919328

@article{ 10.5120/ijca2019919328,
author = { Dimitrios Papakyriakou },
title = { Benchmarking Raspberry Pi 2 Hadoop Cluster },
journal = { International Journal of Computer Applications },
issue_date = { Aug 2019 },
volume = { 178 },
number = { 42 },
month = { Aug },
year = { 2019 },
issn = { 0975-8887 },
pages = { 37-47 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume178/number42/30820-2019919328/ },
doi = { 10.5120/ijca2019919328 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:52:55.227902+05:30
%A Dimitrios Papakyriakou
%T Benchmarking Raspberry Pi 2 Hadoop Cluster
%J International Journal of Computer Applications
%@ 0975-8887
%V 178
%N 42
%P 37-47
%D 2019
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The increasing trends of data growth with the Internet and Internet of Things (IoT), the big data topic is becoming not only important but also very challenging for Data Centers. Apache Hadoop is a framework that allows for the distributed processing of huge amount of datasets across clusters of computers. Big Data Analytics applications have already started to move beyond the classic Hadoop architecture towards very close to real-time architectures such as Spark etc. In this sense, a fundamental understanding of a Hadoop and MapReduce principles and services (e.g. Hive, HBase etc.,) where operates on top of the Hadoop core, can be considered a very good starting point to have a good view of the Big Data World. This manuscript presents not only the design and deployment, but also a performance evaluation of benchmarks and stress testing of a Hadoop cluster. Given the fact that the raspberry pi is an affordable single board computer (SBC) gives the chance to everyone to enhance its knowledge and contribute, in a reasonable degree to the academic community, based on Raspberry Pi 2 abilities as an integrated computer. The current model is comprised of 15 low cost Raspberry Pi 2 model B computers with CPU 900 MHz, 32-bit quad-core ARM Cortex-A7 CPU processors and RAM 1GHz each node. The most common benchmarking and testing tools that are included in the Apache Hadoop distribution, are the TestDFSIO, TeraSort, NNBench and MRbench tools. Broadly speaking, the above mentioned tools are very popular choices to benchmark and stress test a Hadoop cluster to measure the performance, to compare the results and to share the outcome with other people who are interested in the topic. In this project the TestDFSIO tool is used to stress test the Hadoop cluster.

References
  1. Hadoop. [Online]. Available: https://www.sas.com/el_gr/insights/big-data/hadoop.html
  2. Raspberry Pi 2 Model B. [Online]. Available: https://www.raspberrypi.org/products/raspberry-pi-2-model-b/
  3. Dimitrios Papakyriakou, Dimitra Kottou, and Ioannis Kostouros. Benchmarking Raspberry Pi 2 Beowulf Cluster. International Journal of Computer Applications 179(32):21-27, April 2018. doi: 10.5120/ijca2018916728.
  4. Raspberry Pi 2 Model B. Operating System. [Online]. Available: https://www.raspberrypi.org/downloads/
  5. Apache Hadoop. [Online]. Available: https://hadoop.apache.org/
  6. Sanjay Ghemawat, Howard Gobioff, and Shun-TakLeung. The Google File System, ACM Symposium on Operating Systems Principles, Lake George, NY, pp. 29 – 43, October 2003.
  7. Dean Jeffery, and Sanjay Ghemawat. MapReduce: Simplified Data Processing on   Large Clusters. Google Research Publication San Francisco, CA (2004): pp. 137-150 [Online]. Available: https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf.
  8. Hadoop MapReduce Job Execution flow Chart. [Online]. Available: https://techvidvan.com/tutorials/mapreduce-job-execution-flow/
  9. Srinath Perera and Thilina Gunarathne. Hadoop MapReduce Cookbook. Packt Publishing Ltd, February 2013.
  10. Apache Hadoop YARN. [Online]. Available: https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
  11. HDFS Architecture Guide. [Online]. Available: https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
  12. Hadoop, Architecture, Ecosystem, Components. [Online]. Available: https://www.guru99.com/learn-hadoop-in-10-minutes.html
  13. Tom White. Hadoop: The definitive Guide. O’REILLY, June 2009.
  14. Tanmay Deshpande. Hadoop Real-World Solutions Cookbook 2nd edition. Packt Publishing Ltd, March 2016
Index Terms

Computer Science
Information Sciences

Keywords

Raspberry Pi Hadoop cluster Cloud Computing Hadoop Big Data Big Data Analytics Parallel Computing MapReduce Hadoop cluster benchmark.