CFP last date
20 July 2026
Reseach Article

AutoScale-ML with HASA: A Docker-based Framework for Distributed AutoML Model Selection

by Md. Attaur Rahman Sofi, Mohd. Yousuf
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Number 121
Year of Publication: 2026
Authors: Md. Attaur Rahman Sofi, Mohd. Yousuf
10.5120/ijca57e5a1e8f472

Md. Attaur Rahman Sofi, Mohd. Yousuf . AutoScale-ML with HASA: A Docker-based Framework for Distributed AutoML Model Selection. International Journal of Computer Applications. 187, 121 ( Jun 2026), 8-14. DOI=10.5120/ijca57e5a1e8f472

@article{ 10.5120/ijca57e5a1e8f472,
author = { Md. Attaur Rahman Sofi, Mohd. Yousuf },
title = { AutoScale-ML with HASA: A Docker-based Framework for Distributed AutoML Model Selection },
journal = { International Journal of Computer Applications },
issue_date = { Jun 2026 },
volume = { 187 },
number = { 121 },
month = { Jun },
year = { 2026 },
issn = { 0975-8887 },
pages = { 8-14 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume187/number121/autoscale-ml-with-hasa-a-docker-based-framework-for-distributed-automl-model-selection/ },
doi = { 10.5120/ijca57e5a1e8f472 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2026-07-01T03:10:16.293161+05:30
%A Md. Attaur Rahman Sofi
%A Mohd. Yousuf
%T AutoScale-ML with HASA: A Docker-based Framework for Distributed AutoML Model Selection
%J International Journal of Computer Applications
%@ 0975-8887
%V 187
%N 121
%P 8-14
%D 2026
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper presents AutoScale-ML with HASA, a hierarchical adaptive search framework for automated machine learning (AutoML) model selection, implemented within a simulated seven-node distributed computing environment consisting of one master node and six independent Docker containers, each exposing a REST endpoint through Flask. Each worker trains a randomly assigned Scikit-learn classifier drawn from RandomForest, GradientBoosting, ExtraTrees, DecisionTree, and LogisticRegression on a 50,000-sample synthetic classification dataset (50 features, 20 informative) generated via scikit-learn's make_classification, and returns accuracy, training runtime, simulated network delay, and a composite score to a central master process. The master applies a three-phase Hierarchical Adaptive Search Algorithm (HASA): Phase 1 collects all six worker evaluations and retains the top-4 by composite score; Phase 2 re-ranks those four candidates and retains the top-2; Phase 3 selects the single best model by maximum composite score. Experimental results—including per-model benchmarks, phase-by-phase HASA traces, penalty coefficient sensitivity analysis, and network delay characterisation—demonstrate that the framework effectively balances prediction accuracy and computational efficiency through runtime-aware hierarchical model selection. Comprehensive evaluation across five classifier families reveals that the composite scoring function heavily penalises ensemble training times, often favouring lightweight models over higher-accuracy alternatives. The penalty coefficient α is shown to be a critical first-class configuration parameter that must be calibrated to deployment context. The findings highlight the framework's usefulness as a reproducible baseline for containerised AutoML experimentation.

References
  1. X. He, K. Zhao, and X. Chu, "AutoML: A survey of the state-of-the-art," Knowledge-Based Systems, vol. 212, p. 106622, 2021.
  2. F. Hutter, L. Kotthoff, and J. Vanschoren, Automated Machine Learning: Methods, Systems, Challenges. Springer, 2019.
  3. M. Feurer et al., "Auto-sklearn 2.0: Hands-free AutoML via meta-learning," Journal of Machine Learning Research, vol. 23, no. 261, pp. 1–61, 2022.
  4. M. A. Zöller and M. F. Huber, "Benchmark and survey of automated machine learning frameworks," Journal of Artificial Intelligence Research, vol. 70, pp. 409–472, 2021.
  5. L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar, "Hyperband: A novel bandit-based approach to hyperparameter optimization," Journal of Machine Learning Research, vol. 18, no. 185, pp. 1–52, 2018.
  6. J. Bergstra and Y. Bengio, "Random search for hyper-parameter optimization," Journal of Machine Learning Research, vol. 13, pp. 281–305, 2012.
  7. S. Falkner, A. Klein, and F. Hutter, "BOHB: Robust and efficient hyperparameter optimization at scale," Proc. ICML, pp. 1437–1446, 2018.
  8. B. Shahriari et al., "Taking the human out of the loop: A review of Bayesian optimization," Proceedings of the IEEE, vol. 104, no. 1, pp. 148–175, 2016.
  9. J. Snoek, H. Larochelle, and R. P. Adams, "Practical Bayesian optimization of machine learning algorithms," in Advances in NeurIPS, 2012, pp. 2951–2959.
  10. A. Merkel, "Docker: Lightweight Linux containers for consistent development and deployment," Linux Journal, vol. 2014, no. 239, 2014.
  11. F. Pedregosa et al., "Scikit-learn: Machine learning in Python," Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
  12. X. Bouthillier et al., "Accounting for variance in machine learning benchmarks," Proc. MLSys, 2021.
  13. P. Trirat, W. Jeong, and S. J. Hwang, "AutoML-Agent: A multi-agent LLM framework for full-pipeline AutoML," arXiv:2410.02958, 2024.
  14. L. Franceschi et al., "Hyperparameter optimization in machine learning," arXiv:2410.22854, 2024.
  15. M. Semmelrock et al., "Reproducibility in machine-learning-based research: Overview, barriers, and drivers," AI Magazine (Wiley), 2025. doi:10.1002/aaai.70002.
  16. B. Burns, B. Grant, D. Oppenheimer, E. Brewer, and J. Wilkes, "Borg, Omega, and Kubernetes," Communications of the ACM, vol. 59, no. 5, pp. 50–57, 2016.
  17. A. Ronacher, "Flask Documentation," Pallets Projects. [Online]. Available: https://flask.palletsprojects.com. Accessed: Apr. 09, 2026.
  18. F. Pedregosa et al., "Scikit-learn Documentation: sklearn.datasets.make_classification," Scikit-learn Developers. [Online]. Available: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html. Accessed: Apr. 19, 2026.
Index Terms

Computer Science
Information Sciences

Keywords

AutoML; Hierarchical Search; Flask REST; Docker Compose; Scikit-learn; Model Selection; Containerised ML; Distributed Computing; HASA; Composite Scoring; Penalty Coefficient