CFP last date
20 May 2024
Reseach Article

An Approach towards Optimizing Random Forest using Dynamic Programming Algorithm

by Vrushali Y. Kulkarni, Aashu Singh, Pradeep K Sinha
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 75 - Number 16
Year of Publication: 2013
Authors: Vrushali Y. Kulkarni, Aashu Singh, Pradeep K Sinha
10.5120/13193-0785

Vrushali Y. Kulkarni, Aashu Singh, Pradeep K Sinha . An Approach towards Optimizing Random Forest using Dynamic Programming Algorithm. International Journal of Computer Applications. 75, 16 ( August 2013), 9-16. DOI=10.5120/13193-0785

@article{ 10.5120/13193-0785,
author = { Vrushali Y. Kulkarni, Aashu Singh, Pradeep K Sinha },
title = { An Approach towards Optimizing Random Forest using Dynamic Programming Algorithm },
journal = { International Journal of Computer Applications },
issue_date = { August 2013 },
volume = { 75 },
number = { 16 },
month = { August },
year = { 2013 },
issn = { 0975-8887 },
pages = { 9-16 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume75/number16/13193-0785/ },
doi = { 10.5120/13193-0785 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:44:25.254563+05:30
%A Vrushali Y. Kulkarni
%A Aashu Singh
%A Pradeep K Sinha
%T An Approach towards Optimizing Random Forest using Dynamic Programming Algorithm
%J International Journal of Computer Applications
%@ 0975-8887
%V 75
%N 16
%P 9-16
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Random Forest (RF) is an ensemble supervised machine learning technique. Based on bagging and random feature selection, number of decision trees (base classifiers) is generated and majority voting is taken among them. The size of RF is subjective and varies from one dataset to another. Furthermore due to the randomization induced during creation, and its huge size, RF has at best been described as black-box. Various changes based on the experimental results have been proposed in the algorithm since then to optimize the performance of RF. To this end, we aim to find a subset, having accuracy comparable to original RF but having a much smaller size. In this paper, we show that the problem of selection of optimal subset of random forest follows the dynamic programming paradigm. Applying this approach to various UCI data-sets, corresponding subsets are obtained and studied. We conclude that such subsets do exist and that they are not unique. Moreover the size of these subsets is small fraction of the original RF (in the range of tens) and that accuracy of these subsets is a discrete valued function over its range.

References
  1. Leo Brieman, "Random Forests", Machine Learning, 45, 5-32, (2001)
  2. Ho, T. : "The random subspace method for constructing decision forests" – IEEE Transactions on Pattern Analysis and Machine Intelligence (1998)
  3. Breiman, L. : "Bagging Predictors" – Machine Learning (1996)
  4. Simon Bernard, Laurent Heutte, and Sebastien Adam, "On the Selection of Decision Trees in Random Forest", Proceedings of International Joint Cobference on Neural Networks, Atlanta, Georgia, USA, June 14-19, pp 302-307, (2009)
  5. Bernard, S. , Heutte, L. , Adam, S. : "Towards a Better Understanding of Random Forests Through the Study of Strength and Correlation" – 2009
  6. Heping Zhang, Minghui Wang, "Search for the smallest Random Forest", Statistics and Its Interface Volume 2, pp 381-388, (2009)
  7. Boinee P, Angelis A and Foresti G, Meta Random Forest, International Journal of Computational Intelligence 2, (2006)
  8. Tsymbal A, Pechenizkiy M and Cunningham P, Dynamic Integration with Random Forest, ECML, LNAI, 801-808, Springer-Verlag (2006)
  9. Robnik M, Sikonja, Improving Random Forests, J F Boulicaut et al (eds): Machine Learning, ECML 2004 Proceedings, Springer, Berlin, (2004)
  10. E Tripoli, D Fotiadis, G Manis, " Dynamic Construction of Random Forests: Evaluation using Biomedical Engineering Problems", IEEE 2010
  11. Vrushali Kulkarni, Pradeep Sinha, Aashu Singh,: Heuristic Based Improvements for Effective Random Forest Classifier, Proceedings of International Conference on Computational Intelligence (published by Springer), Chennai, India, 2012
  12. I. H. Witten, E. Frank, Weka : Practical machine learning tools and techniques, Morgan Kaufmann publisher, (2005)
  13. Grahn H, Lavesson N, Lapajne M and Slat D, A CUDA implementation of Random Forest Early Results, Master Thesis Software Engineering,School of Computing, Blekinge Institute of Technology, Sweden
  14. Saffari A, Leistner C, Santner J, Godec M and Bischof H, On-line Random Forests, ICCV IEEE, Conference Proceedings 1393-1400, (2009)
  15. Wang A, Wan G, Cheng Z and Li S, An Incremental Extremely Random Forest Classifier for Online Learning and Tracking, 16th IEEE International Conference on Image Processing,1449-1452, (2009)
  16. Chain C, Liaw A and Breiman L, Using Random forest to Learn Imbalanced Data, Technical Report, Department of Statistics, U. C. Berkley (2004)
  17. Kosorok M and Ma S, Marginal Asymptotics for the Large p Small n paradigm: With Applications to Microarray Data, Ann Statist 35, 1456-1486, (2007)
  18. Bonissone P, Cadenas J, Garrido M and Diaz-Valladares R, A Fuzzy Random Forest, International Journal of Approximate Reasoning, 51, 729-747, (2010)
  19. Leistner C, Saffari A, Santner J, Godec M and Bischof H, Semi- Supervised Random Forests, ICCV IEEE, Conference Proceedings, 506-513 (2009)
  20. Patrice, L. , Olivier, D. , Christine, D. : "Limiting the Number of Trees in Random Forests" – 2nd International Workshop on Multiple Classifier Systems – MCS, Cambridge, UK (2001)
  21. Cormen, T. , Leiserson, C. , Rivest, R. , Stein, C. : "Introduction To Algorithms" – 2nd Edition.
  22. Vrushali Y Kulkarni, Pradeep K Sinha, "Random Forest Classifiers: A Survey and Future research Directions", International Journal of Advanced Computing, Vol 36, Issue 1, 1144-1153 (2013)
Index Terms

Computer Science
Information Sciences

Keywords

Ensemble classifier Random Forest Decision tree Dynamic Programming Optimal subset