CFP last date
20 May 2024
Reseach Article

Improving Classification Accuracy based on Random Forest Model with Uncorrelated High Performing Trees

by S. Bharathidason, C. Jothi Venkataeswaran
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 101 - Number 13
Year of Publication: 2014
Authors: S. Bharathidason, C. Jothi Venkataeswaran
10.5120/17749-8829

S. Bharathidason, C. Jothi Venkataeswaran . Improving Classification Accuracy based on Random Forest Model with Uncorrelated High Performing Trees. International Journal of Computer Applications. 101, 13 ( September 2014), 26-30. DOI=10.5120/17749-8829

@article{ 10.5120/17749-8829,
author = { S. Bharathidason, C. Jothi Venkataeswaran },
title = { Improving Classification Accuracy based on Random Forest Model with Uncorrelated High Performing Trees },
journal = { International Journal of Computer Applications },
issue_date = { September 2014 },
volume = { 101 },
number = { 13 },
month = { September },
year = { 2014 },
issn = { 0975-8887 },
pages = { 26-30 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume101/number13/17749-8829/ },
doi = { 10.5120/17749-8829 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:31:35.815425+05:30
%A S. Bharathidason
%A C. Jothi Venkataeswaran
%T Improving Classification Accuracy based on Random Forest Model with Uncorrelated High Performing Trees
%J International Journal of Computer Applications
%@ 0975-8887
%V 101
%N 13
%P 26-30
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Random forest can achieve high classification performance through a classification ensemble with a set of decision trees that grow using randomly selected subspaces of data. The performance of an ensemble learner is highly dependent on the accuracy of each component learner and the diversity among these components. In random forest, randomization would cause occurrence of bad trees and may include correlated trees. This leads to inappropriate and poor ensemble classification decision. In this paper an attempt has been made to improve the performance of the model by including only uncorrelated high performing trees in a random forest. Experimental results have shown that, the random forest can be further enhanced in terms of the classification accuracy.

References
  1. Breiman, L. 2001. Random Forests. Machine Learning, Vol. 45 Issue 1, pp. 5-32.
  2. Breiman, L. 1996. Heuristics of instability and stabilization in model selection. The Annals of Statistics, Vol. 24 Issue 6, pp. 2350–2383.
  3. Ho, T. 1998. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20 Issue 8, pp. 832–844.
  4. Amit, Y. and Geman, D. 1997. Shape quantization and recognition with randomized trees. Neural Computation, Vol. 9, Issue 7, pp. 1545–1588.
  5. Goldstein, B. , Polley, E. , and Briggs, F. 2011. Random forests for genetic association studies. Statistical Applications in Genetics and Molecular Biology, Vol. 10, Issue 1, pp. 1–34.
  6. Siroky, D. 2009. Navigating random forests and related advances in algorithmic modeling. Statistics Surveys, 3:147–163.
  7. Jiang, P. , Wu, H. , Wang, W. , Ma, W. , Sun, X. , and Lu, Z. 2007. Mipred: classification of real and pseudo microrna precursors using random forest prediction model with combined features. Nucleic Acids Research, Vol. 35, Issue 2. pp. 339–344.
  8. Palmer, D. , O'Boyle, N. , Glen, R. , and Mitchell, J. 2007. Random forest models to predict aqueous solubility. J Chem Inf Model, Vol. 47, Issue 1, pp. 150–158.
  9. Kumar, M. and Thenmozhi, M. 2006. Forecasting stock index movement: A comparison of support vector machines and random forest. Indian Institute of Capital Markets 9th Capital Markets Conference.
  10. Diaz-Uriarte, R. and de AndršŠs, S. A. 2006. Gene selection and classification of microarray data using random forest. BMC Bioinformatics, Vol. 7, pp. 3–15.
  11. Ward, M. , Pajevic, S. , Dreyfuss, J. , and Malley, J. 2006. Short-term prediction of mortality in patients with systemic lupus erythematosus: Classification of outcomes using random forests. Arthritis and Rheumatism, Vol. 55, pp. 74–80.
  12. Shi, T. , Seligson, D. , Belldegrun, A. , Palotie, A. , and Horvath, S. 2005. Tumor classification by tissue microarray profiling: Random forest clustering applied to renal cell carcinoma. Modern Pathology, Vol. 18, Issue 4, pp. 547–557.
  13. Pal, M. 2003. Random forest classifier for remote sensing classification. International Journal of Remote Sensing, Vol. 26, Issue 1, pp. 217–222.
  14. Ozuysal, M. , P. Fua, and V. Lepetit. 2007. Fast key point recognition in ten lines of code. In Proc. CVPR,. pp. 1377–1379.
  15. Geurts, P. , D. Ernst, and L. Wehenkel. 2006. Extremely randomized trees. Machine Learning, Vol. 36, Issue 1, pp. 3–42.
  16. Bernard, S. , Heutte, L. , Adam, S. 2009. On the selection of decision trees in random forests. International Joint Conference on Neural Network , pp. 302–307.
  17. Banfield, R. , Hall, L. , Bowyer, K. , Kegelmeyer,W. 2006. A comparison of decision tree ensemble creation techniques. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, Issue 1, pp. 173–180.
  18. Boinee, P. , Angelis, A. D. , Foresti, G. 2005. Ensembling classifiers - an application to image data classification from cherenkov telescope experiment. World Academy of Science, Engineering and Technology, Vol. 12, pp. 66–70.
  19. Baoxun Xu, Junjie Li, Qiang Wang, Xiaojun Chen, 2012. A Tree Selection Model for Improved Random Forest, Bulletin of advanced technology research, Vol. 6 No. 2 Apr. 2012.
  20. Dietterich, T. G. 2000. "An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization," Machine Learning, Vol. 40, Issue 2, pp. 139–157.
  21. Banfield, R. E. , L. O. Hall, K. W. Bowyer and W. P. Kegelmeyer, 2007. A Comparison of Decision Tree Ensemble Creation Techniques, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, Issue 1, pp. 173–180.
  22. Dietterich, T. G. 1997. "Machine learning research: For current directions," AI Magazine, Vol. 18, Issue 4, pp. 97–136.
  23. Moro, S. , R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology.
  24. Novais, P. , et al. 2011. Proceedings of the European Simulation and Modelling Conference (Eds) - ESM'2011, Guimarães, Portugal, October, 2011. EUROSIS. Available: http://hdl. handle. net/1822/14838. pp. 117–121.
  25. Pacific-Asia Knowledge Discovery and Data Mining conference (14th), 2010. Hyderabad, India. PAKDD2010 hosted data mining competition, co-organized by NeuroTech Ltd. and Center for Informatics of the Federal University of Pernambuco (Brazil). Available: http://sede. neurotech. com. br/PAKDD2010.
Index Terms

Computer Science
Information Sciences

Keywords

Strength Correlation Tree Performance Decision trees.