CFP last date
20 May 2024
Reseach Article

An Empirical Evaluation of AdaBoost Extensions for Cost-Sensitive Classification

by Ankit Desai, P. M. Jadav
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 44 - Number 13
Year of Publication: 2012
Authors: Ankit Desai, P. M. Jadav
10.5120/6325-8677

Ankit Desai, P. M. Jadav . An Empirical Evaluation of AdaBoost Extensions for Cost-Sensitive Classification. International Journal of Computer Applications. 44, 13 ( April 2012), 34-41. DOI=10.5120/6325-8677

@article{ 10.5120/6325-8677,
author = { Ankit Desai, P. M. Jadav },
title = { An Empirical Evaluation of AdaBoost Extensions for Cost-Sensitive Classification },
journal = { International Journal of Computer Applications },
issue_date = { April 2012 },
volume = { 44 },
number = { 13 },
month = { April },
year = { 2012 },
issn = { 0975-8887 },
pages = { 34-41 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume44/number13/6325-8677/ },
doi = { 10.5120/6325-8677 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:35:29.308662+05:30
%A Ankit Desai
%A P. M. Jadav
%T An Empirical Evaluation of AdaBoost Extensions for Cost-Sensitive Classification
%J International Journal of Computer Applications
%@ 0975-8887
%V 44
%N 13
%P 34-41
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Classification is a data mining technique used to predict group membership for data instances. Cost-sensitive classifier is relatively new field of research in the data mining and machine learning communities. They are basically used for classification tasks under the cost-based model, unlike the error-based model. Error based classifier AdaBoost is a simple algorithm that reweights the training instances to build multiple classifiers in training phase, without considering the cost of misclassification. Out of all generated classifiers in training, in classification, it collects the weighted votes from each and classifies the new sample (example) according to maximum votes collected. Intuitively, combining multiple models shall give more robust predictions than a single model under the situation where misclassification costs are considered. Boosting has been shown to be an effective method of combining multiple models in order to enhance the predictive accuracy of a single model. Thus, it is natural to think that boosting might also reduce the misclassification costs. All the cost-sensitive boosters are studied and five new extensions are proposed and their results are compared in this paper. A few future extensions are notified.

References
  1. Tao wang, Zhenxing Qin, Zhi Jin and Shichao Zhang , "Handling overfitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning", The journal of systems and software, 2010.
  2. Susan Lomax and Sunil Vadera, "An empirical comparison of cost-sensitive decision tree induction algorithms", july 2011.
  3. Schapire and Singer, "Improved boosting algorithms using confidence-rated predictions". Machine learning, 1999.
  4. Bianca Zadrozny, John Langford, Naoki Abe, "Cost-Sensitive Learning by Cost-Proportionate Example Weighting", Proceedings of the Third IEEE International Conference on Data Mining (ICDM'03).
  5. Geoffrey I. Webb, "Cost-Sensitive Specialization", Proceedings of the 1996 Pacific Rim International Conference on Artificial Intelligence, Cairns, Springer-Verlag, pp. 23-34.
  6. Alan T. Remaley, Maureen L. Sampson, James M, Deleo, Nancy A. Remaley, Beriuse D. Farsi and Mark H. Zweig, "Prevalence-Value-Accuracy Plot: A new method for comparing diagnostic tests based on misclassification costs", 1999.
  7. P. Domingos. "Metacost: A general method for making classifiers cost-sensitive", In KDD, pages 155–164, 1999.
  8. Artur Ferreira, "Survey on boosting algorithms for supervised and semi-supervised learning", oct. 2007.
  9. Kai ming Ting and Zijian Zheng, "Boosting Cost-sensitive trees", Tenth International Conference on Discovery Science, LNAI-1038 (pp. 134-145) Japan: Springer- 2007.
  10. Kai Ming Ting and Zijian Zheng, "Boosting Trees for cost-sensitive classificaiton" , Eighth International Conference on DS, Singapore 2005.
  11. Wie Fan, Salvatore J. Stolfo, Junxin Zhang and Philip k. chan "AdaCost: Misclassification Cost-sensitive Boosting", 27th International Conference on Machine Learning, july 2010.
  12. Web link: Oracle® Data Mining Concepts 11g Release 1, http://download. oracle. com/docs/cd/B28359_01/datamine. 111/b28129/classify. html
  13. Data-set downloaded from: http://tunedit. org/repo/UCI/credit-a. arff
  14. UCI machine learning repository for dataset: http://archive. ics. uci. edu/ml/datasets. html
  15. Wiki pedia page for Prevalence: http://en. wikipedia. org/wiki/Prevalence, updated on 5 July 2011.
  16. Wiki pedia page for sensitivity and specificity: http://en. wikipedia. org/wiki/Sensitivity_and_specificity, updated on 19 July 2011.
  17. Tutorial on DB2 business intelligence, article on cost matrix, http://publib. boulder. ibm. com/infocenter/db2luw/v8/index. jsp?topic=/com. ibm. im. model. doc/c_cost_matrix. html
  18. Tutorial on Oracle® Data Mining Concepts 11g Release 1, http://download. oracle. com/docs/cd/B28359_01/datamine. 111/b28129/classify. htm
  19. OAIDTB: Boosting extensions for WEKA, web-link: http://pisuerga. inf. ubu. es/lsi/Software/oaidtb/
  20. Books: Data mining - Concept and Techniques by Han & Kamber. Data mining: concepts and techniques. The Morgan Kaufmann series in data management systems, ISBN- 1558609016, 9781558609013, publisher morgan, 2006.
Index Terms

Computer Science
Information Sciences

Keywords

Adaboost Cost-sensitive Classifiers Csextension1 Csextension2 Csextension3 Csextension4 Csextension5 Misclassification Cost Number Of High Cost Errors