Intelligent Flaky Test Detection using Historical Failure Patterns: An AI-Driven Approach to Enhance Software Reliability

Pradeepkumar Palanisamy

Call for Paper

February Edition

IJCA solicits high quality original research papers for the upcoming February edition of the journal. The last date of research paper submission is 20 January 2026

Submit your paper

Know more

The week's pick

DHCPv6 Security Threats in Smart City Infrastructure: A Comprehensive Case Study of USA Municipalities

Joy Selasi Agbesi

Random Articles

Reseach Article

Intelligent Flaky Test Detection using Historical Failure Patterns: An AI-Driven Approach to Enhance Software Reliability

by Pradeepkumar Palanisamy

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 187 - Number 23

Year of Publication: 2025

Authors: Pradeepkumar Palanisamy

10.5120/ijca2025925458

Pradeepkumar Palanisamy . Intelligent Flaky Test Detection using Historical Failure Patterns: An AI-Driven Approach to Enhance Software Reliability. International Journal of Computer Applications. 187, 23 ( Jul 2025), 37-43. DOI=10.5120/ijca2025925458

@article{ 10.5120/ijca2025925458,

author = { Pradeepkumar Palanisamy },

title = { Intelligent Flaky Test Detection using Historical Failure Patterns: An AI-Driven Approach to Enhance Software Reliability },

journal = { International Journal of Computer Applications },

issue_date = { Jul 2025 },

volume = { 187 },

number = { 23 },

month = { Jul },

year = { 2025 },

issn = { 0975-8887 },

pages = { 37-43 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume187/number23/intelligent-flaky-test-detection-using-historical-failure-patterns-an-ai-driven-approach-to-enhance-software-reliability/ },

doi = { 10.5120/ijca2025925458 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2025-07-26T00:56:08.190708+05:30

%A Pradeepkumar Palanisamy

%T Intelligent Flaky Test Detection using Historical Failure Patterns: An AI-Driven Approach to Enhance Software Reliability

%J International Journal of Computer Applications

%@ 0975-8887

%V 187

%N 23

%P 37-43

%D 2025

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The burgeoning complexity of modern software systems, coupled with accelerated Continuous Integration/Continuous Deployment (CI/CD) pipelines, has exacerbated the pervasive challenge of flaky tests – non-deterministic failures that undermine developer confidence and impede release velocity. This paper introduces a novel, AI-driven framework engineered to proactively identify, diagnose, and mitigate flaky test failures by intelligently analyzing vast repositories of historical CI/CD data and a diverse array of external contextual signals. Our framework employs a sophisticated ensemble of machine learning models, including deep learning architectures for temporal pattern recognition and graph neural networks for dependency analysis, to precisely isolate the latent root causes of flakiness. Beyond mere detection, the system leverages Explainable AI (XAI) techniques to provide transparent insights into failure mechanisms and proposes intelligent remediation strategies, ranging from automated test quarantines and dynamic test re-prioritization to prescriptive recommendations for test refactoring or code modification. By continuously learning from evolving failure patterns, these AI models not only dramatically improve the stability and throughput of software delivery pipelines but also furnish invaluable, real-time historical insights into test reliability trends, empowering data-driven decision-making, fostering proactive quality assurance, and ultimately cultivating a culture of enhanced software quality and predictability.

References

Harman, M., Jia, Y., & Zhang, Y. (2015). Achievements, open problems and challenges for search based software testing. IEEE International Conference on Software Testing, Verification and Validation (ICST). https://doi.org/10.1109/ICST.2015.7102580
Zhou, Y., Leung, H., & Xu, B. (2015). A comprehensive review on testability. ACM Computing Surveys, 48(3), 1–54. https://doi.org/10.1145/2732198
Arcuri, A., & Briand, L. C. (2011). A practical guide for using statistical tests to assess randomized algorithms in software engineering. Empirical Software Engineering, 16, 1–52. https://doi.org/10.1007/s10664-010-9143-7
Micco, J., et al. (2017). Flaky tests at Google: How to understand, justify, and deal with them. ACM SIGSOFT FSE, 2017. https://dl.acm.org/doi/10.1145/3106237.3106281
Gambi, A., Zeller, A. (2019). When does my flaky test fail? IEEE/ACM International Conference on Automated Software Engineering (ASE). https://doi.org/10.1109/ASE.2019.00050
Huo, H., Xie, T., Zhang, L. (2020). Learning deep features for detecting flaky tests. IEEE Transactions on Software Engineering. https://doi.org/10.1109/TSE.2020.3035790
Luo, Q., Zhang, J., & Wang, Y. (2019). Detecting flaky tests via multi-modal learning. International Symposium on Software Testing and Analysis (ISSTA). https://doi.org/10.1145/3293882.3330577
Kazmi, M. A., & Sarro, F. (2020). Automated detection of flaky tests using machine learning: An empirical study. Information and Software Technology, 130. https://doi.org/10.1016/j.infsof.2020.106392
Palomba, F., et al. (2020). Recommending and localizing flaky tests using machine learning techniques. Empirical Software Engineering, 25, 1040–1077. https://doi.org/10.1007/s10664-019-09752-0
Pearl, J. (2009). Causality: Models, Reasoning and Inference. Cambridge University Press. (Book, foundational for causal inference modeling used in DoWhy)
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). "Why should I trust you?": Explaining the predictions of any classifier. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). https://doi.org/10.1145/2939672.2939778
Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems (NeurIPS).

Index Terms

Computer Science

Information Sciences

Keywords

Flaky Tests AI-based Testing CI/CD Test Stability Machine Learning Test Quarantine Explainable AI Graph Neural Networks Temporal Pattern Analysis Test Reliability Causal Inference Test Prioritization