CFP last date
20 May 2024
Reseach Article

Comparison of Performance of Decision Tree Algorithms and Random Forest: An Application on OECD Countries Health Expenditures

by Songul Cinaroglu
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 138 - Number 1
Year of Publication: 2016
Authors: Songul Cinaroglu
10.5120/ijca2016908704

Songul Cinaroglu . Comparison of Performance of Decision Tree Algorithms and Random Forest: An Application on OECD Countries Health Expenditures. International Journal of Computer Applications. 138, 1 ( March 2016), 37-41. DOI=10.5120/ijca2016908704

@article{ 10.5120/ijca2016908704,
author = { Songul Cinaroglu },
title = { Comparison of Performance of Decision Tree Algorithms and Random Forest: An Application on OECD Countries Health Expenditures },
journal = { International Journal of Computer Applications },
issue_date = { March 2016 },
volume = { 138 },
number = { 1 },
month = { March },
year = { 2016 },
issn = { 0975-8887 },
pages = { 37-41 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume138/number1/24346-2016908704/ },
doi = { 10.5120/ijca2016908704 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:38:32.762262+05:30
%A Songul Cinaroglu
%T Comparison of Performance of Decision Tree Algorithms and Random Forest: An Application on OECD Countries Health Expenditures
%J International Journal of Computer Applications
%@ 0975-8887
%V 138
%N 1
%P 37-41
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Decision trees and Random Forest are most popular methods of machine learning techniques. C4.5 which is an extension version of ID.3 algorithm and CART are one of these most commonly use algorithms to generate decision trees. Random Forest which constructs a lot of number of trees is one of another useful technique for solving both classification and regression problems. This study compares classification performances of different decision trees (C4.5, CART) and Random Forest which was generated using 50 trees. Data came from OECD countries health expenditures for the year 2011. AUC and ROC curve graph was used for performance comparison. Experimental results show that Random Forest outperformed in classification accuracy [AUC=0.98] in comparison with CART (0.95) and C4.5 (0.90) respectively. Future studies more focus on performance comparisons of different machine learning techniques using several datasets and different hyperparameter optimization techniques.

References
  1. Bose, I, Mahapatra, R.K. “Business data mining - a machine learning perspective”, Information & Management, 2001, 39, 211-225.
  2. Libbrecht M.W. Noble W.S. “Machine learning applications in genetics and genomics”, Nature Reviews, 2015, 16, 321-322.
  3. Das R. “A comparison of multiple classification methods for diagnosis of Parkinson disease”, Expert Systems with Applications, 2010, 37, 1568-1572.
  4. Chattamvelli R. “Data Mining Methods, Alpha Science International”, Oxford, UK. 2009.
  5. Hammond D.K. Vandergheynst P. Gribonval R. “Wavelets on graphs via spectral graph theory”, Applied and Computational Harmonic Analysis, 2011, 30(2), 129-150.
  6. Baldwin J.F. Lawry J. Martin T.P. “A Mass assignment based ID3 algorithm for decision tree induction”, International Journal of Intelligent Systems, 1997, vol.12, 523-552.
  7. Jin C. De-lin L. Fen-Xiang M. “An improved ID3 decision tree algorithm”, Proceeding of 2009 4th International Conference on Computer Science & Education, 2009, 127-130.
  8. Salzberg S.L. “C4.5: Programs for machine learning” by Quinlan J.R. Morgan Kaufmann Publishers, Inc., 1993
  9. Gislason P.O. Benediktsson J.A. Sveinsson J.R. “Random forests for land cover classification”, Pattern Recognition Letters, 2004, 27(4), 294-300.
  10. Oshiro T.M. Perez P.S. Baranauskas J.A. “How many trees in a random forest? Machine learning and data mining in pattern recognition”, Lecture Notes in Computer Science, 2012, vol.7376, 154-168.
  11. Liaw A. Wiener M. “Classification and regression by randomforest”, R News, 2002, vol.2/3, 18-22.
  12. Latinne P. Debeir O. Decaestecker C. “Limiting the number of trees in random forests”, Multiple Classifier Systems, Lecture Notes in Computer Science, Springer. 2001.
  13. Bradley A. “The use of the area under the ROC curve in the evaluatıon of machıne learning algorithms”, Pattern Recognition, 1997, 30(7), 1145-1159.
  14. Fawcett T. “An introduction to ROC analysis”, Pattern Recognition Letters, 2006, 27(8), 861-874.
  15. Fraggi D. Reiser B. “Estimation of the area under the ROC curve”, Statistics in Medicine, 2002, 21, 3093-3106.
  16. Potrafke N. “The growth of public health expenditures in OECD countries: Do government ideology and electoral motives matter?”, Journal of Health Economics, 2010, 29, 797-810.
  17. OECD StatExtracts, http://stats.oecd.org/, Accessed on: 25.01.2016.
  18. Alin A. “Multicollinearity”, Computational Statistics, 2010, 2(3), 370-374.
  19. Oja H. Randles R.H. “Multivariate nonparametric tests”, Statistica Science, 2004, 19(4), 598-605.
  20. Dietterich T.G. “An Experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting and randomization”, Machine Learning, 40, 139-157, 2000.
  21. Witten I.H. Frank E. Data Mining Practical Machine Learning Tools and Techniques, Elsevier, Third Edition, Morgan Kaufmann Publishers. 2005.
  22. Sohn S.Y. Moon T.H. “Decision Tree based on data envelopment analysis for effective technology commercialization”, Expert Systems with Applications, 2004, 26, 279-284.
  23. Rotim S.T. Dobsa J. Krakar Z. “Using decision trees for identification of most relevant indicators for effective ICT Utilization”, Bulgarian Academy of Sciences, Cybernetics and Information Technologies, 2013, 13(1), 83-94.
Index Terms

Computer Science
Information Sciences

Keywords

Pattern Recognition Machine Learning Decision Trees Health Expenditures.