Call for Paper - April 2020 Edition
IJCA solicits original research papers for the April 2020 Edition. Last date of manuscript submission is March 20, 2020. Read More

Early Prediction of Software Defect using Ensemble Learning: A Comparative Study

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2018
Ashraf Sayed Abdou, Nagy Ramadan Darwish

Ashraf Sayed Abdou and Nagy Ramadan Darwish. Early Prediction of Software Defect using Ensemble Learning: A Comparative Study. International Journal of Computer Applications 179(46):29-40, June 2018. BibTeX

	author = {Ashraf Sayed Abdou and Nagy Ramadan Darwish},
	title = {Early Prediction of Software Defect using Ensemble Learning: A Comparative Study},
	journal = {International Journal of Computer Applications},
	issue_date = {June 2018},
	volume = {179},
	number = {46},
	month = {Jun},
	year = {2018},
	issn = {0975-8887},
	pages = {29-40},
	numpages = {12},
	url = {},
	doi = {10.5120/ijca2018917185},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}


Recently, early prediction of software defects using the machine learning techniques has attracted more attention of researchers due to its importance in producing a successful software. On the other side, it reduces the cost of software development and facilitates procedures to identify the reasons for determining the percentage of defect-prone software in future. There is no conclusive evidence for specific types of machine learning that will be more efficient and accurate to predict of software defects. However, some of the previous related work proposes the ensemble learning techniques as a more accurate alternative. This paper introduces the resample technique with three types of ensemble learners; Boosting, Bagging and Rotation Forest, using eight of base learner tested on seven types of benchmark datasets provided in the PROMISE repository. Results indicate that accuracy has been improved using ensemble techniques more than single leaners especially in conjunction with Rotation Forest with the resample technique in most of the algorithms used in the experimental results.


  1. C. Catal and B. Diri. "A systematic review of software fault prediction studies", Expert systems with applications, Vol. 36, No. 4, 2009, pp.7346-7354.
  2. Wang, and Z.Liu. "Software defect prediction based on classifiers ensemble", Jisuanji Yingyong Yanjiu,Vol 30, No. 6 , 2013,pp.1734-1738.
  3. S. Nickolas, V. Reddy, S. Reddy and A.Nickolas "Feature selection using decision tree induction in class level metrics dataset for software defect predictions", In Proceedings of the world congress on engineering and computer science, Vol. 1, 2010, pp. 124-129.
  4. K Elish and M. Elish, "Predicting defect-prone software modules using support vector machines", Journal of Systems and Software,Vol. 81, No. 5, 2008,pp. 649-660.
  5. J. Zheng "Cost-sensitive Boosting neural networks for software defect prediction", Expert Systems with Applications,Vol. 37, No. 6, 2010, PP.4537-4543.
  6. T. Wang and W. Li, "Naive bayes software defect prediction model", CiSE, 2010, pp. 1-4.
  7. H. Wang and T. Khoshgoftaar "A comparative study of ensemble feature selection techniques for software defect prediction", CMLA, 2010.
  8. H. Laradji, M. Alshayeb and L. Ghouti, "Software defect prediction using ensemble learning on selected features", Information and Software Technology,Vol. 58, 2015,pp. 388-402.
  9. S. Qinbao, Z. Jia, M. Shepperd, S. Ying, and J. Liu, "A general software defect-proneness prediction framework", IEEE Transactions on Software Engineering, Vol. 37, No. 3, 2011, pp.356-370.
  10. W. Shuo, L. Minku and X. Yao, "Online class imbalance learning and its applications in fault detection", IJCIA, Vol.12, No.4, 2013.
  11. S. Naeem, M. Khoshgoftaar and V. Hulse, "Predicting faults in high assurance software", HASE, 2010, pp. 26-34.
  12. Z .Sun, Q. Song, X. Zhu,” Using coding-based ensemble learning to improve software defect prediction”, IEEE Transactions on Systems, Vol.43, No.6, 2012, pp. 313-325.
  13. T. Wang, W. Li, H. Shi and Z. Liu, "Software defect prediction based on classifiers ensemble", JICS, Vol.8, No.16, 2011, pp.4241-4254.
  14. A. Kaur and K. Kamaldeep, "Performance analysis of ensemble learning for predicting defects in open source software", ICACCI, 2014, pp. 219-225.
  15. P. Singh and A. Chug, "Software defect prediction analysis using machine learning algorithms", In International Conference of Computing and Data Science, 2017, pp. 775-781.
  16. H.Shamsul, L.Kevin and M.Abdelrazek, "An ensemble oversampling model for class imbalance problem in software defect prediction ", IEEE, 2018.
  17. A. Abdel Aziz, N. Ramadan and H. Hefny, "Towards a Machine Learning Model for Predicting Failure of Agile Software Projects", IJCA, Vol.168, No.6, 2017.
  18. S. Wang and X. Yao, "Using class imbalance learning for software defect prediction", Vol.62, No.2, 2013, pp.434-443.
  19. H. Yuan, C. Van Wiele and S. Khorram. "An automated artificial neural network system for land use/land cover classification from Landsat TM imagery", MDPI, Vol.1, No.3, 2009, pp. 243-265.
  20. T. Kavzoglu, and I. Colkesen, "A kernel functions analysis for support vector machines for land cover classification", IJAEO, Vol.11, No.5, 2009, pp. 352-359.
  21. A. Kaur, K. Kaur and D. Chopra "An empirical study of software entropy based bug prediction using machine learning", ICDS ,Vol.8, No.2, 2017, pp. 599-616.
  22. J. Chen, H. Huang, S. Tian, and Y. Qu. "Feature selection for text classification with Naïve Bayes", Expert Systems with Applications, Vol.6, No. 3, 2009, pp. 5432-5435.
  23. C. Toon and S. Verwer, "Three naive Bayes approaches for discrimination-free classification", Data Mining and Knowledge Discovery, Vol. 21, No. 2, 2010,pp. 277-292.
  24. A. Kaur and R. Malhotram, “Application of random forest in predicting fault-prone classes”, ICACTE, Vol.8, 2008, pp. 37-43.
  25. A.Koru, and H. Liu, "Building effective defect-prediction models in practice. IEEE software”, Vol.22, No.6, 2005, pp.23-29.
  26. I. Kurt, M. Ture and Kurum, (2008). "Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease”, Expert systems with applications, Vol.34, No.1, pp. 366-374.
  27. E. Frank and H. Witten "Generating accurate rule sets without global optimization ", Computing and Mathematical Sciences, 1998.
  28. T, Khoshgoftaar, H. Van and A. Napolitano, "Comparing boosting and bagging techniques with noisy and imbalanced data”, IEEE Transactions on Systems, Vol.41, No.3, pp.552-568.
  29. M. Galar, A. Fernandez, E. Barrenechea, H. Bustince, and F Herrera, "A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches", IEEE Transactions on Systems, Man,Vol.42, No. 4 , 2012, pp. 463-484.
  30. E. Menahem, L. Rokach, and Y. Elovici, "Troika–An improved stacking schema for classification tasks", Information Sciences,Vo; 179, No. 24 ,2009, pp. 4097-4122.
  31. V. Francisco, B. Ghimire and J. Rogan, "An assessment of the effectiveness of a random forest classifier for land-cover classification", ISPRS Vol.67, 2012, pp. 93-104.
  32. Y. Saeys,., T. Abeel and Y. Van, “Robust feature selection using ensemble feature selection techniques”, In Joint European Conference on Machine Learning and Knowledge Discovery in Databases,2008, pp. 313-325.
  33. Y. Ma, K. Qin, and S.Zhu, "Discrimination analysis for predicting defect-prone software modules", Journal of Applied Mathematics, 2014.
  34. Q. Song,, Z. Jia and M. Shepperd ,"A general software defect-proneness prediction framework", IEEE Transactions on Software Engineering, Vol. 37, No..3, 2011, pp. 356-370.
  35. Y. Jiang, C. Bojan, M. Tim and B. Nick, "Comparing design and code metrics for software quality prediction, PROMISE, 2008, pp. 11-18.
  36. M. Hall, E. Mark, G. Holmes and B. Pfahringer, "The WEKA data mining software: an update", ACM SIGKDD Vol. 11, No. 1, 2009, pp.10-18.
  37. T. Menzies, J. Greenwald and Frank, "Data mining static code attributes to learn defect predictors”. IEEE Software Engineering, Vol.33, No.1, 2007, pp.2-13.
  38. J Demšar, "Statistical comparisons of classifiers over multiple data sets. Journal of Machine learning research”, 2006, Vol.7, pp.1-30.
  39. S. Aleem, L. Fernando Capretz, and F. Ahmed, "Comparative performance analysis of machine learning techniques for software bug detection", CSCP, 2015, pp. 71-79.


Software Defects, Ensemble methods, Resample Technique, Base Learner, Bagging, Boosting, Rotation Forest.