International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 187 - Number 32 |
Year of Publication: 2025 |
Authors: Dimitrios Papakyriakou, Ioannis S. Barbounakis |
![]() |
Dimitrios Papakyriakou, Ioannis S. Barbounakis . Parallel k-Means Benchmarking on a CPU-Bound Beowulf Cluster of Raspberry Pi Nodes: An MPI-based Scaling Analysis with CPU-Centric Performance Evaluation. International Journal of Computer Applications. 187, 32 ( Aug 2025), 43-55. DOI=10.5120/ijca2025925585
This study presents an in-depth parallel benchmarking analysis of the k-Means clustering algorithm on a Beowulf cluster composed of Raspberry Pi 4B nodes, each equipped with 8GB of RAM. Leveraging MPI for distributed computation, it is systematically evaluating the algorithm’s strong scaling behaviour using synthetic datasets of fixed size -75 million two-dimensional points - while varying the number of MPI processes from 2 up to 48 (with two processes per node). The performance evaluation focuses on a detailed execution time decomposition across five key phases: data generation, parallel distance computation (Compute Phase), synchronization via MPI_Allreduce (Sync Phase), centroid updates (Update Phase), (k-Means Phase) and total runtime. Results confirm that the Compute Phase remains the dominant contributor to total runtime, consistently accounting for the majority of execution time across all configurations. Synchronization overhead increases moderately at intermediate process counts, a typical phenomenon in distributed systems, but remains manageable and does not offset the overall speedup achieved through parallelization. The Beowulf cluster demonstrates excellent scalability and high parallel efficiency throughout the strong scaling experiments, with total runtime reduced by nearly (10×) when increasing from 2 to 48 MPI processes. Memory usage remains within physical RAM limits due to careful dataset partitioning, enabling large-scale processing on low-power ARM-based nodes. Overall, this work highlights the feasibility and efficiency of CPU-centric, memory-aware distributed machine learning on energy-efficient Raspberry Pi clusters. The proposed benchmarking framework provides a robust and reproducible foundation for analysing algorithmic performance, scalability, and resource utilization in lightweight distributed environments, aligning with contemporary trends in edge computing and resource-constrained high-performance computing.