Performance Analysis of Raspberry Pi 4B (8GB) Beowulf Cluster: HPCG Benchmarking

Dimitrios Papakyriakou; Ioannis S. Barbounakis

Call for Paper

January Edition

IJCA solicits high quality original research papers for the upcoming January edition of the journal. The last date of research paper submission is 22 December 2025

Submit your paper

Know more

The week's pick

A Hybrid Transformer-CNN Framework with Early and Late Fusion for Robust Skin Lesion Classification

Raihan Tanvir

Random Articles

Reseach Article

Performance Analysis of Raspberry Pi 4B (8GB) Beowulf Cluster: HPCG Benchmarking

by Dimitrios Papakyriakou, Ioannis S. Barbounakis

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 187 - Number 25

Year of Publication: 2025

Authors: Dimitrios Papakyriakou, Ioannis S. Barbounakis

10.5120/ijca2025925449

Dimitrios Papakyriakou, Ioannis S. Barbounakis . Performance Analysis of Raspberry Pi 4B (8GB) Beowulf Cluster: HPCG Benchmarking. International Journal of Computer Applications. 187, 25 ( Jul 2025), 49-64. DOI=10.5120/ijca2025925449

@article{ 10.5120/ijca2025925449,

author = { Dimitrios Papakyriakou, Ioannis S. Barbounakis },

title = { Performance Analysis of Raspberry Pi 4B (8GB) Beowulf Cluster: HPCG Benchmarking },

journal = { International Journal of Computer Applications },

issue_date = { Jul 2025 },

volume = { 187 },

number = { 25 },

month = { Jul },

year = { 2025 },

issn = { 0975-8887 },

pages = { 49-64 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume187/number25/performance-analysis-of-raspberry-pi-4b-8gb-beowulf-cluster-hpcg-benchmarking/ },

doi = { 10.5120/ijca2025925449 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2025-07-31T02:40:02+05:30

%A Dimitrios Papakyriakou

%A Ioannis S. Barbounakis

%T Performance Analysis of Raspberry Pi 4B (8GB) Beowulf Cluster: HPCG Benchmarking

%J International Journal of Computer Applications

%@ 0975-8887

%V 187

%N 25

%P 49-64

%D 2025

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The High-Performance Conjugate Gradient (HPCG) benchmark has emerged as a complementary metric to the High Performance LINPACK (HPL) [1], aiming to evaluate real-world high-performance computing (HPC) workloads that emphasize memory access patterns, cache behavior, and sparse matrix operations. Unlike HPL, which reflects peak floating-point capability, HPCG simulates practical scientific computations involving iterative solvers and irregular memory access, offering a more realistic performance indicator. This study investigates the implementation and analysis of the HPCG benchmark on a 24-node Beowulf cluster built with Raspberry Pi 4B devices, each equipped with 8GB LPDDR4 RAM and ARM Cortex-A72 processors. Both strong scaling (fixed problem size with increasing nodes) and weak scaling (proportional increase in problem size and nodes) methodologies were applied to assess system performance across various configurations. Metrics such as median execution time, floating-point throughput (GFLOP/s), and memory bandwidth (GB/s) were collected and analyzed. The results reveal that HPCG performance on this ARM-based cluster is primarily constrained by memory bandwidth saturation, lack of hardware-level floating-point acceleration, and network communication bottlenecks. Strong scaling experiments show minimal performance gains beyond 4–8 nodes, while weak scaling maintains computational stability up to moderate cluster sizes. Notably, the absence of measurable MPI communication overhead (ExchangeHalo time) underscores the limited halo data exchange under small subdomain decomposition and short runtimes. This study highlights the limitations and potential of energy-efficient, low-cost single-board clusters for realistic HPC workloads. The findings provide a methodological basis for benchmarking sparse solvers on ARM systems and inform future efforts in optimizing parallelism, memory access, and interconnect efficiency in edge computing, education, and embedded HPC environments.

References

Dimitrios Papakyriakou, Ioannis S. Barbounakis. High Performance Linpack (HPL) Benchmark on Raspberry Pi 4B (8GB) Beowulf Cluster. International Journal of Computer Applications. 185, 25 (Jul 2023), 11-19. DOI=10.5120/ijca2023923005
Dimitrios Papakyriakou, Ioannis S. Barbounakis. Performance Analysis of Raspberry Pi 4B (8GB) Beowulf Cluster: STREAM Benchmarking. International Journal of Computer Applications. 186, 78 (Apr 2025), 41-55. DOI=10.5120/ijca2025924687
Raspberry Pi 3+ Model B. [Online]. Available: https://www.raspberrypi.com/products/raspberry-pi-3-model-b-plus/
Raspberry Pi 4 Model B. [Online]. Available: raspberrypi.com/products/raspberry-pi-4-model-b/.
Raspberry Pi 4 Model B specifications. [Online]. Available: https://magpi.raspberrypi.com/articles/raspberry-pi-4-specs-benchmarks
HPCG Benchmark. [Online]. Available: https://www.hpcg-benchmark.org/
Jack Dongarra, Michael A Heroux, Piotr Luszczek "High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems," The International Journal of High-Performance Computing Applications, SAGE, Volume: 30 issue: 1, 3-10, August 17, 2015
Jack Dongarra, Michael A. Heroux "Toward a New Metric for Ranking High Performance Computing Systems," Sandia National Laboratories Technical Report, SAND2013-4744, June, 2013
Dongarra, J., Heroux, M. A., & Luszczek, P. (2016). High-Performance Conjugate Gradient (HPCG) Benchmark. University of Tennessee and Sandia National Laboratories. Retrieved from https://www.hpcg-benchmark.org

Index Terms

Computer Science

Information Sciences

Keywords

Raspberry Pi 4 Beowulf cluster Cluster Message Passing Interface (MPI) MPICH Memory Performance Low-cost Clusters Parallel Computing ARM Architecture HPCG Benchmark