CFP last date
22 April 2024
Reseach Article

Distributed AdaBoost Extensions for Cost-sensitive Classification Problems

by Ankit Desai, Sanjay Chaudhary
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 177 - Number 12
Year of Publication: 2019
Authors: Ankit Desai, Sanjay Chaudhary
10.5120/ijca2019919531

Ankit Desai, Sanjay Chaudhary . Distributed AdaBoost Extensions for Cost-sensitive Classification Problems. International Journal of Computer Applications. 177, 12 ( Oct 2019), 1-8. DOI=10.5120/ijca2019919531

@article{ 10.5120/ijca2019919531,
author = { Ankit Desai, Sanjay Chaudhary },
title = { Distributed AdaBoost Extensions for Cost-sensitive Classification Problems },
journal = { International Journal of Computer Applications },
issue_date = { Oct 2019 },
volume = { 177 },
number = { 12 },
month = { Oct },
year = { 2019 },
issn = { 0975-8887 },
pages = { 1-8 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume177/number12/30946-2019919531/ },
doi = { 10.5120/ijca2019919531 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:45:38.615970+05:30
%A Ankit Desai
%A Sanjay Chaudhary
%T Distributed AdaBoost Extensions for Cost-sensitive Classification Problems
%J International Journal of Computer Applications
%@ 0975-8887
%V 177
%N 12
%P 1-8
%D 2019
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In data mining, classification of data has always been an area of interest and this is especially true after the rapid increase in availability of data being collected. Cost-sensitive classification is a subset of the broader classification problem where the focus is on solving the class imbalance problem. This paper addresses the class imbalance problem using Cost-sensitive Distributed Boosting (CsDb). CsDb is a meta-classifier designed to solve the class imbalance problem for big data, is based on the concept of MapReduce. The focus of this work is to solve the class imbalance problem for the size of data which is beyond the capacity of standalone commodity hardware to handle. CsDb solves the classification problems by learning models in a distributed environment. Empirical evaluation of CsDb carried over datasets from different application domains shows average reduction of misclassification cost and number of high cost errors by 21.06% and 30.15% respectively with respect to its predecessors of type error based classifier. It preserves the cost-sensitivity of cost based predecessor. While it preserves the accuracy and F1-score, the model building time is reduced by 90.14% as compared to a non-distributed cost-sensitive classifier.

References
  1. Elkan, C. (2001). The foundations of cost-sensitive learning. In International joint conference on artificial intelligence Vol. 17, No. 1, pp. 973-978.
  2. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), 119-139.
  3. Ting, K. M., & Zheng, Z. (1998). Boosting cost-sensitive trees. In International Conference on Discovery Science (pp. 244- 255). Springer, Berlin, Heidelberg.
  4. Susan Lomax and Sunil Vadera. (2013). A survey of costsensitive decision tree induction algorithms. ACM Comput. Surv. 45, 2, Article 16.
  5. Ting, K. M. (2000). A comparative study of cost-sensitive boosting algorithms. In Proceedings of the 17th International Conference on Machine Learning (pp. 983–990)
  6. Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.
  7. Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.
  8. Berry, M. J., & Linoff, G. (1997). Data mining techniques: for marketing, sales, and customer support. John Wiley & Sons, Inc.
  9. Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets. Cambridge university press.
  10. Palit, I.,&Reddy, C. K. (2012). Scalable and parallel boosting with mapreduce. IEEE Transactions on Knowledge and Data Engineering, 24(10), 1904-1916.
  11. Ye, J., Chow, J. H., Chen, J., & Zheng, Z. (2009). Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM conference on Information and knowledge management (pp. 2061-2064). ACM.
  12. Lazarevic, A., & Obradovic, Z. (2002). Boosting algorithms for parallel and distributed learning. Distributed and Parallel Databases, 11(2), 203-229.
  13. Abualkibash, M., ElSayed, A., & Mahmood, A. (2013). Highly Scalable, Parallel and Distributed AdaBoost Algorithm using Light Weight Threads and Web Services on a Network of Multi-Core Machines. arXiv preprint arXiv:1306.1467.
  14. Cooper, J., & Reyzin, L. (2017). Improved algorithms for distributed boosting. In Communication, Control, and Computing (Allerton), 2017 55th Annual Allerton Conference on (pp. 806- 813).
  15. Bowyer, K. W., Hall, L. O., Moore, T., Chawla, N., & Kegelmeyer, W. P. (2000). A parallel decision tree builder for mining very large visualization datasets. In Systems, Man, and Cybernetics, 2000 IEEE International Conference on (Vol. 3, pp. 1888-1893).
  16. Shafer, J., Agrawal, R., & Mehta, M. (1996). SPRINT: A scalable parallel classi er for data mining. In Proc. 1996 Int. Conf. Very Large Data Bases (pp. 544-555).
  17. Desai, A., & Jadav, P. M. (2012). An empirical evaluation of ad boost extensions for cost-sensitive classification. International Journal of Computer Applications, 44(13), 34-41.
  18. Desai, A., Jadav, K., & Chaudhary, S. (2015). An Empirical evaluation of CostBoost Extensions for Cost-Sensitive Classification. In Proceedings of the 8th Annual ACM India Conference (pp. 73-77). ACM.
  19. Desai, A., & Chaudhary, S. (2016). Distributed Decision Tree. In Proceedings of the 9th Annual ACM India Conference (pp. 43-50).
  20. Desai, A., & Chaudhary, S. (2017). Distributed decision tree v. 2.0. In Big Data (Big Data), 2017 IEEE International Conference on (pp. 929-934).
  21. Yahoo! Webscope dataset ydata-frontpage-todaymoduleclicksv1 0 [http://labs.yahoo.com/Academic Relations].
Index Terms

Computer Science
Information Sciences

Keywords

Class imbalance problem distributed boosting distributed classification