A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers

Kedar Potdar; Taher S. Pardawala; Chinmay D. Pai

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2025

Submit your paper

Know more

The week's pick

Assessing LLMs as Cognitive Interpreters of Student Prompts: A Typological Framework

Tadeu da Ponte Matevz Vremec Matej Mertik

Random Articles

Reseach Article

A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers

by Kedar Potdar, Taher S. Pardawala, Chinmay D. Pai

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 175 - Number 4

Year of Publication: 2017

Authors: Kedar Potdar, Taher S. Pardawala, Chinmay D. Pai

10.5120/ijca2017915495

Kedar Potdar, Taher S. Pardawala, Chinmay D. Pai . A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers. International Journal of Computer Applications. 175, 4 ( Oct 2017), 7-9. DOI=10.5120/ijca2017915495

@article{ 10.5120/ijca2017915495,

author = { Kedar Potdar, Taher S. Pardawala, Chinmay D. Pai },

title = { A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers },

journal = { International Journal of Computer Applications },

issue_date = { Oct 2017 },

volume = { 175 },

number = { 4 },

month = { Oct },

year = { 2017 },

issn = { 0975-8887 },

pages = { 7-9 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume175/number4/28474-2017915495/ },

doi = { 10.5120/ijca2017915495 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:24:08.584422+05:30

%A Kedar Potdar

%A Taher S. Pardawala

%A Chinmay D. Pai

%T A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers

%J International Journal of Computer Applications

%@ 0975-8887

%V 175

%N 4

%P 7-9

%D 2017

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In classification analysis, the dependent variable is frequently influenced not only by ratio scale variables, but also by qualitative (nominal scale) variables. Machine Learning algorithms accept only numerical inputs, hence, it is necessary to encode these categorical variables into numerical values using encoding techniques. This paper presents a comparative study of seven categorical variable encoding techniques to be used for classification using Artificial Neural Networks on a categorical dataset. The Car Evaluation dataset provided by UCI is used for training. Results show that the data encoded with Sum Coding and Backward Difference Coding technique give highest accuracy as compared to the data pre-processed by rest of the techniques.

References

“Types of Data & Measurement Scales.”, MyMarketResearchMethods (n.d) Retrieved July 2017, from www.mymarketresearchmethods.com/types-of-data- nominal-ordinal-interval-ratio/.
N Gujarati, Damodar, “Basic econometrics.”, The McGraw Hill, 2004
K. Potdar and R. Kinnerkar, “A Non-linear Autoregressive Neural Network Model for Forecasting Indian Index of Industrial Production”, Proceedings of the IEEE Tensymp 2017, Kochi, India
“What is Machine Learning.”, WhatIs (June 2017) Retrieved July 2017, from whatis.techtarget.com/definition/machine- learning.
“Evolution of Machine Learning.”, SAS (n.d) Retrieved July 2017, from www.sas.com/en_us/insights/analytics/machine- learning.html.
Gregory Carey, (2003) Coding Categorical Variables, Retrieved July 2017, from psych.colorado.edu/čarey/courses/psyc5741/handouts/Coding %20Categorical%20Variables%202006-03-03.pdf.
Brett Lantz, “Machine Learning with R”, Packt Publishing Limited, 2013. ISBN - 978-1782162148.
Von Eye, Alexander, and Clifford C. Clogg, eds. “Categorical variables in developmental research: Methods of analysis.” Elsevier, 1996.
“R Library Contrast Coding Systems for Categorical Variables”, UCLA (n.d) Retrieved July 2017, from stats.idre.ucla.edu/r/library/r-library-contrast-coding- systems-for-categorical-variables/
Saravanan K and S. Sasithra, “REVIEW ON CLASSIFICATION BASED ON ARTIFICIAL NEURAL NETWORKS”, International Journal of Ambient Systems and Applications (IJASA) Vol.2, No.4, December 2014.
M. Bohanec and V. Rajkovic, “Knowledge acquisition and explanation for multi-attribute decision making.” In 8th Intl. Workshop on Expert Systems and their Applications, Avignon, France. pages 59-78, 1988.
B.Zupan, M.Bohanec, I.Bratko, J.Demsar, “Machine learning by function decomposition.” ICML-97, Nashville, TN. 1997 (to appear).
Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Index Terms

Computer Science

Information Sciences

Keywords

Machine Learning Statistical Learning Artificial Neural Networks Data Preprocessing.