Call for Paper - January 2024 Edition
IJCA solicits original research papers for the January 2024 Edition. Last date of manuscript submission is December 20, 2023. Read More

Improving Classification Accuracy based on Random Forest Model with Uncorrelated High Performing Trees

International Journal of Computer Applications
© 2014 by IJCA Journal
Volume 101 - Number 13
Year of Publication: 2014
S. Bharathidason
C. Jothi Venkataeswaran

S Bharathidason and Jothi C Venkataeswaran. Article: Improving Classification Accuracy based on Random Forest Model with Uncorrelated High Performing Trees. International Journal of Computer Applications 101(13):26-30, September 2014. Full text available. BibTeX

	author = {S. Bharathidason and C. Jothi Venkataeswaran},
	title = {Article: Improving Classification Accuracy based on Random Forest Model with Uncorrelated High Performing Trees},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {101},
	number = {13},
	pages = {26-30},
	month = {September},
	note = {Full text available}


Random forest can achieve high classification performance through a classification ensemble with a set of decision trees that grow using randomly selected subspaces of data. The performance of an ensemble learner is highly dependent on the accuracy of each component learner and the diversity among these components. In random forest, randomization would cause occurrence of bad trees and may include correlated trees. This leads to inappropriate and poor ensemble classification decision. In this paper an attempt has been made to improve the performance of the model by including only uncorrelated high performing trees in a random forest. Experimental results have shown that, the random forest can be further enhanced in terms of the classification accuracy.


  • Breiman, L. 2001. Random Forests. Machine Learning, Vol. 45 Issue 1, pp. 5-32.
  • Breiman, L. 1996. Heuristics of instability and stabilization in model selection. The Annals of Statistics, Vol. 24 Issue 6, pp. 2350–2383.
  • Ho, T. 1998. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20 Issue 8, pp. 832–844.
  • Amit, Y. and Geman, D. 1997. Shape quantization and recognition with randomized trees. Neural Computation, Vol. 9, Issue 7, pp. 1545–1588.
  • Goldstein, B. , Polley, E. , and Briggs, F. 2011. Random forests for genetic association studies. Statistical Applications in Genetics and Molecular Biology, Vol. 10, Issue 1, pp. 1–34.
  • Siroky, D. 2009. Navigating random forests and related advances in algorithmic modeling. Statistics Surveys, 3:147–163.
  • Jiang, P. , Wu, H. , Wang, W. , Ma, W. , Sun, X. , and Lu, Z. 2007. Mipred: classification of real and pseudo microrna precursors using random forest prediction model with combined features. Nucleic Acids Research, Vol. 35, Issue 2. pp. 339–344.
  • Palmer, D. , O'Boyle, N. , Glen, R. , and Mitchell, J. 2007. Random forest models to predict aqueous solubility. J Chem Inf Model, Vol. 47, Issue 1, pp. 150–158.
  • Kumar, M. and Thenmozhi, M. 2006. Forecasting stock index movement: A comparison of support vector machines and random forest. Indian Institute of Capital Markets 9th Capital Markets Conference.
  • Diaz-Uriarte, R. and de AndršŠs, S. A. 2006. Gene selection and classification of microarray data using random forest. BMC Bioinformatics, Vol. 7, pp. 3–15.
  • Ward, M. , Pajevic, S. , Dreyfuss, J. , and Malley, J. 2006. Short-term prediction of mortality in patients with systemic lupus erythematosus: Classification of outcomes using random forests. Arthritis and Rheumatism, Vol. 55, pp. 74–80.
  • Shi, T. , Seligson, D. , Belldegrun, A. , Palotie, A. , and Horvath, S. 2005. Tumor classification by tissue microarray profiling: Random forest clustering applied to renal cell carcinoma. Modern Pathology, Vol. 18, Issue 4, pp. 547–557.
  • Pal, M. 2003. Random forest classifier for remote sensing classification. International Journal of Remote Sensing, Vol. 26, Issue 1, pp. 217–222.
  • Ozuysal, M. , P. Fua, and V. Lepetit. 2007. Fast key point recognition in ten lines of code. In Proc. CVPR,. pp. 1377–1379.
  • Geurts, P. , D. Ernst, and L. Wehenkel. 2006. Extremely randomized trees. Machine Learning, Vol. 36, Issue 1, pp. 3–42.
  • Bernard, S. , Heutte, L. , Adam, S. 2009. On the selection of decision trees in random forests. International Joint Conference on Neural Network , pp. 302–307.
  • Banfield, R. , Hall, L. , Bowyer, K. , Kegelmeyer,W. 2006. A comparison of decision tree ensemble creation techniques. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, Issue 1, pp. 173–180.
  • Boinee, P. , Angelis, A. D. , Foresti, G. 2005. Ensembling classifiers - an application to image data classification from cherenkov telescope experiment. World Academy of Science, Engineering and Technology, Vol. 12, pp. 66–70.
  • Baoxun Xu, Junjie Li, Qiang Wang, Xiaojun Chen, 2012. A Tree Selection Model for Improved Random Forest, Bulletin of advanced technology research, Vol. 6 No. 2 Apr. 2012.
  • Dietterich, T. G. 2000. "An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization," Machine Learning, Vol. 40, Issue 2, pp. 139–157.
  • Banfield, R. E. , L. O. Hall, K. W. Bowyer and W. P. Kegelmeyer, 2007. A Comparison of Decision Tree Ensemble Creation Techniques, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, Issue 1, pp. 173–180.
  • Dietterich, T. G. 1997. "Machine learning research: For current directions," AI Magazine, Vol. 18, Issue 4, pp. 97–136.
  • Moro, S. , R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology.
  • Novais, P. , et al. 2011. Proceedings of the European Simulation and Modelling Conference (Eds) - ESM'2011, Guimarães, Portugal, October, 2011. EUROSIS. Available: http://hdl. handle. net/1822/14838. pp. 117–121.
  • Pacific-Asia Knowledge Discovery and Data Mining conference (14th), 2010. Hyderabad, India. PAKDD2010 hosted data mining competition, co-organized by NeuroTech Ltd. and Center for Informatics of the Federal University of Pernambuco (Brazil). Available: http://sede. neurotech. com. br/PAKDD2010.