CFP last date
22 September 2025
Call for Paper
October Edition
IJCA solicits high quality original research papers for the upcoming October edition of the journal. The last date of research paper submission is 22 September 2025

Submit your paper
Know more
Random Articles
Reseach Article

Parallel k-Means Benchmarking on a CPU-Bound Beowulf Cluster of Raspberry Pi Nodes: An MPI-based Scaling Analysis with CPU-Centric Performance Evaluation

by Dimitrios Papakyriakou, Ioannis S. Barbounakis
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Number 32
Year of Publication: 2025
Authors: Dimitrios Papakyriakou, Ioannis S. Barbounakis
10.5120/ijca2025925585

Dimitrios Papakyriakou, Ioannis S. Barbounakis . Parallel k-Means Benchmarking on a CPU-Bound Beowulf Cluster of Raspberry Pi Nodes: An MPI-based Scaling Analysis with CPU-Centric Performance Evaluation. International Journal of Computer Applications. 187, 32 ( Aug 2025), 43-55. DOI=10.5120/ijca2025925585

@article{ 10.5120/ijca2025925585,
author = { Dimitrios Papakyriakou, Ioannis S. Barbounakis },
title = { Parallel k-Means Benchmarking on a CPU-Bound Beowulf Cluster of Raspberry Pi Nodes: An MPI-based Scaling Analysis with CPU-Centric Performance Evaluation },
journal = { International Journal of Computer Applications },
issue_date = { Aug 2025 },
volume = { 187 },
number = { 32 },
month = { Aug },
year = { 2025 },
issn = { 0975-8887 },
pages = { 43-55 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume187/number32/parallel-k-means-benchmarking-on-a-cpu-bound-beowulf-cluster-of-raspberry-pi-nodes-an-mpi-based-scaling-analysis-with-cpu-centric-performance-evaluation/ },
doi = { 10.5120/ijca2025925585 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2025-08-20T21:35:27.407534+05:30
%A Dimitrios Papakyriakou
%A Ioannis S. Barbounakis
%T Parallel k-Means Benchmarking on a CPU-Bound Beowulf Cluster of Raspberry Pi Nodes: An MPI-based Scaling Analysis with CPU-Centric Performance Evaluation
%J International Journal of Computer Applications
%@ 0975-8887
%V 187
%N 32
%P 43-55
%D 2025
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This study presents an in-depth parallel benchmarking analysis of the k-Means clustering algorithm on a Beowulf cluster composed of Raspberry Pi 4B nodes, each equipped with 8GB of RAM. Leveraging MPI for distributed computation, it is systematically evaluating the algorithm’s strong scaling behaviour using synthetic datasets of fixed size -75 million two-dimensional points - while varying the number of MPI processes from 2 up to 48 (with two processes per node). The performance evaluation focuses on a detailed execution time decomposition across five key phases: data generation, parallel distance computation (Compute Phase), synchronization via MPI_Allreduce (Sync Phase), centroid updates (Update Phase), (k-Means Phase) and total runtime. Results confirm that the Compute Phase remains the dominant contributor to total runtime, consistently accounting for the majority of execution time across all configurations. Synchronization overhead increases moderately at intermediate process counts, a typical phenomenon in distributed systems, but remains manageable and does not offset the overall speedup achieved through parallelization. The Beowulf cluster demonstrates excellent scalability and high parallel efficiency throughout the strong scaling experiments, with total runtime reduced by nearly (10×) when increasing from 2 to 48 MPI processes. Memory usage remains within physical RAM limits due to careful dataset partitioning, enabling large-scale processing on low-power ARM-based nodes. Overall, this work highlights the feasibility and efficiency of CPU-centric, memory-aware distributed machine learning on energy-efficient Raspberry Pi clusters. The proposed benchmarking framework provides a robust and reproducible foundation for analysing algorithmic performance, scalability, and resource utilization in lightweight distributed environments, aligning with contemporary trends in edge computing and resource-constrained high-performance computing.

References
  1. Dimitrios Papakyriakou, Ioannis S. Barbounakis. Data Mining Methods: A Review. International Journal of Computer Applications. 183, 48 (Jan 2022), 5-19. DOI=10.5120/ijca2022921884
  2. Raspberry Pi 4 Model B. [Online]. Available: raspberrypi.com/products/raspberry-pi-4-model-b/.
  3. Raspberry Pi 4 Model B specifications. [Online]. Available: https://magpi.raspberrypi.com/articles/raspberry-pi-4-specs-benchmarks
  4. Aurelien, M. (2022). PEP 668 – Marking Python base environments as externally managed. Python Software Foundation. https://peps.python.org/pep-0668/
  5. J. Dean and S. Ghemawat, "MapReduce: Simplified data processing on large clusters," Commun. ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008
  6. M. Zaharia et al., "Apache Spark: A unified engine for big data processing," Commun. ACM, vol. 59, no. 11, pp. 56–65, Nov. 2016
  7. A. Sergeev and M. Del Balso, "Horovod: fast and easy distributed deep learning in TensorFlow," arXiv preprint arXiv:1802.05799, 2018
  8. Google, "Multi Worker Mirrored Strategy Guide," TensorFlow Docs, 2023
  9. W. Gropp, E. Lusk, and A. Skjellum, Using MPI: Portable Parallel Programming with the Message Passing Interface, 3rd ed., MIT Press, 2014
  10. J. Dongarra et al., "High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems," Int. J. High Perform. Comput. Appl., vol. 30, no. 1, pp. 3–10, Feb. 2016
  11. M. Rocklin, "Dask: Parallel computation with blocked algorithms and task scheduling," Proc. 14th Python in Science Conference, 2015
  12. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M. J., Shenker, S., & Stoica, I. (2010). Spark: Cluster computing with working sets. In Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud'10). USENIX Association
  13. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., ... & Zheng, X. (2016). TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI '16) (pp. 265–283). USENIX Association
  14. Sergeev, A., & Del Balso, M. (2018). Horovod: fast and easy distributed deep learning in TensorFlow. In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS) Workshop
  15. Thakur, R., Rabenseifner, R., & Gropp, W. (2005). Optimization of collective communication operations in MPICH. In Proceedings of the International Conference on Computational Science (ICCS 2005) (pp. 49–57). Springer
  16. Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1), 107–113. https://doi.org/10.1145/1327452.1327492
  17. Gropp, W., Lusk, E., & Skjellum, A. (2014). Using MPI: Portable Parallel Programming with the Message-Passing Interface. MIT press.
  18. Dongarra, J., Beckman, P., Moore, T., et al. (2021). The International Exascale Software Project Roadmap. International Journal of High-Performance Computing Applications, 35(1), 3–60
  19. Kogias, E., Christou, I. T., & Triantafyllidis, G. (2020). Distributed Machine Learning on Edge Devices: A Survey. IEEE Access, 8, 211309–211328
  20. Mariani, L., Bartolini, A., Borghi, G., & Benini, L. (2022). Scalable Edge Machine Learning on Raspberry Pi Clusters. Future Generation Computer Systems, 128, 190–203
Index Terms

Computer Science
Information Sciences

Keywords

Raspberry Pi 4B Beowulf Cluster ARM Architecture Parallel Computing CPU-Bound Workload k-Means Clustering Message Passing Interface (MPI) MPICH Memory-Conscious Scaling Low-Cost Clusters Synthetic Data Benchmarking Execution Time Analysis Distributed Systems HPC Performance Evaluation