Statistical Approach for Predicting the Most Accurate Classification Algorithm for a Data Set in Analysis

Shriniwas Nayak; Aditya Mahaddalkar

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 20 June 2025

Submit your paper

Know more

The week's pick

Designing Multi-Tenant E-Learning Systems in the Cloud: A Process-Oriented Approach for Higher Education

Sameh Azouzi Sonia Ayachi Ghannouchi

Random Articles

Analysing and Implementing the Mobility over MANETS using Random Way Point Model

April

2013

Issues Related to Transit Network Design Problem

June

2015

Neural-Fuzzy Approach for Power Load Forecasting Analysis

May

2013

A Comprehensive Survey on Online Anomaly Detection

June

2015

Reseach Article

Statistical Approach for Predicting the Most Accurate Classification Algorithm for a Data Set in Analysis

by Shriniwas Nayak, Aditya Mahaddalkar

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 176 - Number 28

Year of Publication: 2020

Authors: Shriniwas Nayak, Aditya Mahaddalkar

10.5120/ijca2020920306

Shriniwas Nayak, Aditya Mahaddalkar . Statistical Approach for Predicting the Most Accurate Classification Algorithm for a Data Set in Analysis. International Journal of Computer Applications. 176, 28 ( Jun 2020), 1-7. DOI=10.5120/ijca2020920306

@article{ 10.5120/ijca2020920306,

author = { Shriniwas Nayak, Aditya Mahaddalkar },

title = { Statistical Approach for Predicting the Most Accurate Classification Algorithm for a Data Set in Analysis },

journal = { International Journal of Computer Applications },

issue_date = { Jun 2020 },

volume = { 176 },

number = { 28 },

month = { Jun },

year = { 2020 },

issn = { 0975-8887 },

pages = { 1-7 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume176/number28/31373-2020920306/ },

doi = { 10.5120/ijca2020920306 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:43:39.203569+05:30

%A Shriniwas Nayak

%A Aditya Mahaddalkar

%T Statistical Approach for Predicting the Most Accurate Classification Algorithm for a Data Set in Analysis

%J International Journal of Computer Applications

%@ 0975-8887

%V 176

%N 28

%P 1-7

%D 2020

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Classification algorithms under the category of data mining have widespread applications in the modern world finding their use in almost every field and area that aims at predicting an outcome class for some data instance. As a result of which many supervised classification algorithms have been studied in the field of machine learning. Many classification algorithms can be used to serve the purpose, K-Nearest Neighbor, Gaussian Naive Bayes, Decision Tree to name a few. However even today it is a time consuming and complex task to decide the most suitable algorithm for the data under consideration. This article discusses an approach that predicts an algorithm that would produce best accuracy for the given data, depending upon internal data parameters : size of data, ratio of numerical attributes, count of outliers, average correlation, number of classes in target and average number of classes in attributes. This paper analyses the relation between the performance of K-Nearest Neighbor, Logistic Regression, Gaussian Naive Bayes and Decision Tree classification algorithms and internal data parameters thereby evaluating a generic approach to determine the most accurate algorithm and also studies some limitations, like the inability of incorporating external factors namely memory requirement and others.

References

Kaggle, (accessed March 2020). https://www.kaggle.com/.
University of California Irvine Machine Learning Repository, (accessed March 2020). https://archive.ics.uci.edu/ml/index.php.
Hetal Bhavsar and Amit Ganatra. A comparative study of training algorithms for supervised machine learning. International Journal of Soft Computing and Engineering (IJSCE), 2(4):2231–2307, 2012.
Giuseppe Bonaccorso. Machine learning algorithms. Packt Publishing Ltd, 2017.
N. S. Chauhan. Decision tree algorithm explained, December 2019 (accessed March 2020). https://towardsdatascience.com/decision-tree-algorithmexplained- 83beb6e78ef4.
Thomas Cover and Peter Hart. Nearest neighbor pattern classification. IEEE transactions on information theory, 13(1):21–27, 1967.
Rafet Duriqi, Vigan Raca, and Betim Cico. Comparative analysis of classification algorithms on three different datasets using weka. In 2016 5th Mediterranean Conference on Embedded Computing (MECO), pages 335–338. IEEE, 2016.
Nir Friedman, Dan Geiger, and Moises Goldszmidt. Bayesian network classifiers. Machine learning, 29(2-3):131–163, 1997.
Jiawei Han, Jian Pei, and Micheline Kamber. Data mining: concepts and techniques. Elsevier, 2011.
Sayali D Jadhav and HP Channe. Comparative study of k-nn, naive bayes and decision tree classification techniques. International Journal of Science and Research (IJSR), 5(1):1842–1845, 2016.
Ron Kohavi et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai, volume 14, pages 1137–1145. Montreal, Canada, 1995.
V Krishnaiah, G Narsimha, and N Subhash Chandra. Survey of classification techniques in data mining. International Journal of Computer Sciences and Engineering, 2(9):65–74, 2014.
Gang Luo. A review of automatic selection methods for machine learning algorithms and hyper-parameter values. Network Modeling Analysis in Health Informatics and Bioinformatics, 5(1):18, 2016.
Sagar S Nikam. A comparative study of classification techniques in data mining algorithms. Oriental journal of computer science & technology, 8(1):13–19, 2015.
N Satyanarayana, CH Ramalingaswamy, and Y Ramadevi. Survey of classification techniques in data mining. International Journal of Innovative Science, Engineering & Technology, 1:268–278, 2014.
Emc Education Services. Data science and big data analytics: Discovering, analyzing, visualizing and presenting data. pages 205–229, 2015.
R. Shaikh. Choosing the best algorithm for your classification model, November 2018 (accessed March 2020). https://medium.com/datadriveninvestor/choosing-thebest- algorithm-for-your-classification-model-7c632c78f38f.
Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pages 2951–2959, 2012.
Farha Syeda, Mustafa Ali Baig Mirza, Ali Baig, andMPawar. Performance evaluation of different data mining classification algorithm and predictive analysis. IOSR, 10, 01 2013.

Index Terms

Computer Science

Information Sciences

Keywords

Supervised Learning Classification Algorithm Decision Tree Logistic Regression K Nearest Neighbors Naive Bayes