CFP last date
22 April 2024
Reseach Article

A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers

by Kedar Potdar, Taher S. Pardawala, Chinmay D. Pai
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 175 - Number 4
Year of Publication: 2017
Authors: Kedar Potdar, Taher S. Pardawala, Chinmay D. Pai
10.5120/ijca2017915495

Kedar Potdar, Taher S. Pardawala, Chinmay D. Pai . A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers. International Journal of Computer Applications. 175, 4 ( Oct 2017), 7-9. DOI=10.5120/ijca2017915495

@article{ 10.5120/ijca2017915495,
author = { Kedar Potdar, Taher S. Pardawala, Chinmay D. Pai },
title = { A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers },
journal = { International Journal of Computer Applications },
issue_date = { Oct 2017 },
volume = { 175 },
number = { 4 },
month = { Oct },
year = { 2017 },
issn = { 0975-8887 },
pages = { 7-9 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume175/number4/28474-2017915495/ },
doi = { 10.5120/ijca2017915495 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:24:08.584422+05:30
%A Kedar Potdar
%A Taher S. Pardawala
%A Chinmay D. Pai
%T A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers
%J International Journal of Computer Applications
%@ 0975-8887
%V 175
%N 4
%P 7-9
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In classification analysis, the dependent variable is frequently influenced not only by ratio scale variables, but also by qualitative (nominal scale) variables. Machine Learning algorithms accept only numerical inputs, hence, it is necessary to encode these categorical variables into numerical values using encoding techniques. This paper presents a comparative study of seven categorical variable encoding techniques to be used for classification using Artificial Neural Networks on a categorical dataset. The Car Evaluation dataset provided by UCI is used for training. Results show that the data encoded with Sum Coding and Backward Difference Coding technique give highest accuracy as compared to the data pre-processed by rest of the techniques.

References
  1. “Types of Data & Measurement Scales.”, MyMarketResearchMethods (n.d) Retrieved July 2017, from www.mymarketresearchmethods.com/types-of-data- nominal-ordinal-interval-ratio/.
  2. N Gujarati, Damodar, “Basic econometrics.”, The McGraw Hill, 2004
  3. K. Potdar and R. Kinnerkar, “A Non-linear Autoregressive Neural Network Model for Forecasting Indian Index of Industrial Production”, Proceedings of the IEEE Tensymp 2017, Kochi, India
  4. “What is Machine Learning.”, WhatIs (June 2017) Retrieved July 2017, from whatis.techtarget.com/definition/machine- learning.
  5. “Evolution of Machine Learning.”, SAS (n.d) Retrieved July 2017, from www.sas.com/en_us/insights/analytics/machine- learning.html.
  6. Gregory Carey, (2003) Coding Categorical Variables, Retrieved July 2017, from psych.colorado.edu/čarey/courses/psyc5741/handouts/Coding %20Categorical%20Variables%202006-03-03.pdf.
  7. Brett Lantz, “Machine Learning with R”, Packt Publishing Limited, 2013. ISBN - 978-1782162148.
  8. Von Eye, Alexander, and Clifford C. Clogg, eds. “Categorical variables in developmental research: Methods of analysis.” Elsevier, 1996.
  9. “R Library Contrast Coding Systems for Categorical Variables”, UCLA (n.d) Retrieved July 2017, from stats.idre.ucla.edu/r/library/r-library-contrast-coding- systems-for-categorical-variables/
  10. Saravanan K and S. Sasithra, “REVIEW ON CLASSIFICATION BASED ON ARTIFICIAL NEURAL NETWORKS”, International Journal of Ambient Systems and Applications (IJASA) Vol.2, No.4, December 2014.
  11. M. Bohanec and V. Rajkovic, “Knowledge acquisition and explanation for multi-attribute decision making.” In 8th Intl. Workshop on Expert Systems and their Applications, Avignon, France. pages 59-78, 1988.
  12. B.Zupan, M.Bohanec, I.Bratko, J.Demsar, “Machine learning by function decomposition.” ICML-97, Nashville, TN. 1997 (to appear).
  13. Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Index Terms

Computer Science
Information Sciences

Keywords

Machine Learning Statistical Learning Artificial Neural Networks Data Preprocessing.