CFP last date
20 May 2024
Call for Paper
June Edition
IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper
Know more
Reseach Article

An Approach to Automation Selection of Decision Tree based on Training Data Set

by D. Saravanakumar, N. Ananthi, M. Devi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 64 - Number 21
Year of Publication: 2013
Authors: D. Saravanakumar, N. Ananthi, M. Devi
10.5120/10755-5500

D. Saravanakumar, N. Ananthi, M. Devi . An Approach to Automation Selection of Decision Tree based on Training Data Set. International Journal of Computer Applications. 64, 21 ( February 2013), 1-4. DOI=10.5120/10755-5500

@article{ 10.5120/10755-5500,
author = { D. Saravanakumar, N. Ananthi, M. Devi },
title = { An Approach to Automation Selection of Decision Tree based on Training Data Set },
journal = { International Journal of Computer Applications },
issue_date = { February 2013 },
volume = { 64 },
number = { 21 },
month = { February },
year = { 2013 },
issn = { 0975-8887 },
pages = { 1-4 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume64/number21/10755-5500/ },
doi = { 10.5120/10755-5500 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:17:11.620476+05:30
%A D. Saravanakumar
%A N. Ananthi
%A M. Devi
%T An Approach to Automation Selection of Decision Tree based on Training Data Set
%J International Journal of Computer Applications
%@ 0975-8887
%V 64
%N 21
%P 1-4
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In Data mining applications, very large training data sets with several million records are common. Decision trees are very much powerful and excellent technique for both classification and prediction problems. Many decision tree construction algorithms have been proposed to develop and handle large or small training data. Some related algorithms are best for large data sets and some for small data sets. Each algorithm works best for its own criteria. The decision tree algorithms classify categorical and continuous attributes very well but it handles efficiently only a smaller data set. It consumes more time for large datasets. Supervised Learning In Quest (SLIQ) and Scalable Parallelizable Induction of Decision Tree (SPRINT) handles very large datasets. But SLIQ requires that the class labels should be available in main memory beforehand. SPRINT is best suited for large data sets and it removes all these memory restrictions. The research work deals with the automatic selection of decision tree algorithm based on training dataset size. This proposed system first prepares the training dataset size using the mathematical measure. The result training set size problem will be checked with the available memory space. If memory is very sufficient then the tree construction will continue. After the classifying the data, the accuracy of the classifier data set is estimated. The main advantages of the proposed method are that the system takes less time and avoids memory problem.

References
  1. Amir Bar-Or, Daniel Keren, Assaf Schuster, and Ran Wolff, "Hierarchical Decision Tree Induction in Distributed Genomic Databases", IEEE Transactions on Knowledge and Data Engineering, vol. 17, No. 8, August 2005 .
  2. Arun K Pujari, "Data Mining Techniques", Universities Press, 2001
  3. Banerjee M. , and Chakraborty M. K. , "Rough Logics: A survey with further directions," Rough Sets Analysis, Physica Verlag, Heidelberg, 1997.
  4. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, "Classification and Regression Trees. Wadsworth, Belmont", 1984.
  5. J. Bala, J. Huang and H. Vafaie K. DeJong and H. Wechsler "Hybrid Learning Using Genetic Algorithms and Decision Trees for Pattern Classification", 2003.
  6. Carla E. Brodley Paul E. Utgoff, "Multivariate versus Univariate Decision Trees", COINS Technical Report 92-8, Jan 1992
  7. Andrew B. Nobel, "Analysis of a complexity based pruning scheme for classification trees", IEEE Transactions on Information Theory, vol. 48, pp. 2362-2368, 2002.
  8. Rakesh Agrawal, Tomasz Imielinski, and Arun Swami, "Database mining: A performance perspective. IEEE Transactions on Knowledge and Data Engineering", 5(6):914{925, December 1993.
  9. Donato Malerba, Floriana Esposito and Giovanni Semeraro , "A Further Comparision of Simplification Methods for Decision –Tree Induction" , Springer-verlag, 1996.
  10. Floriana Esposito, Donato Malerba, and Giovanni Semeraro "A Comparative Analysis of Methods for Pruning Decision Trees" , IEEE Transactions on pattern analysis and machine intelligence, vol. 19,No. 5, May 1997
  11. Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Ganti_, "RainForest - A Framework for Fast Decision Tree Construction of Large Datasets", Proceedings of the 24th VLDB Conference New York, USA, 1998.
  12. V. Corruble D. E. Brown and C. L. Pittard, "A comparison of decision classifiers with back propagation neural networks for multimodal classification problems", Pattern Recognition, 26:953–961, 1993.
  13. Deborah R. Carvalho, Alex A. Freitas , "A hybrid decision tree/genetic algorithm for coping with the problem of small disjuncts in data mining" 2004.
  14. Haixun Wang, Carlo Zaniolo "CMP: A Fast Decision Tree Classifier Using Multivariate Predictions", University of D. Hand, H. Mannila, P. Smyth," Principles of Data Mining", MIT Press, Cambridge, MA, 2001.
Index Terms

Computer Science
Information Sciences

Keywords

Decision Tree Algorithm Classification Data Mining Data Set