CFP last date
20 May 2024
Reseach Article

Enhanced Classification Accuracy on Naive Bayes Data Mining Models

by Md. Faisal Kabir, Chowdhury Mofizur Rahman, Alamgir Hossain, Keshav Dahal
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 28 - Number 3
Year of Publication: 2011
Authors: Md. Faisal Kabir, Chowdhury Mofizur Rahman, Alamgir Hossain, Keshav Dahal
10.5120/3371-4657

Md. Faisal Kabir, Chowdhury Mofizur Rahman, Alamgir Hossain, Keshav Dahal . Enhanced Classification Accuracy on Naive Bayes Data Mining Models. International Journal of Computer Applications. 28, 3 ( August 2011), 9-16. DOI=10.5120/3371-4657

@article{ 10.5120/3371-4657,
author = { Md. Faisal Kabir, Chowdhury Mofizur Rahman, Alamgir Hossain, Keshav Dahal },
title = { Enhanced Classification Accuracy on Naive Bayes Data Mining Models },
journal = { International Journal of Computer Applications },
issue_date = { August 2011 },
volume = { 28 },
number = { 3 },
month = { August },
year = { 2011 },
issn = { 0975-8887 },
pages = { 9-16 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume28/number3/3371-4657/ },
doi = { 10.5120/3371-4657 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:13:46.958536+05:30
%A Md. Faisal Kabir
%A Chowdhury Mofizur Rahman
%A Alamgir Hossain
%A Keshav Dahal
%T Enhanced Classification Accuracy on Naive Bayes Data Mining Models
%J International Journal of Computer Applications
%@ 0975-8887
%V 28
%N 3
%P 9-16
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

A classification paradigm is a data mining framework containing all the concepts extracted from the training dataset to differentiate one class from other classes existed in data. The primary goal of the classification frameworks is to provide a better result in terms of accuracy. However, in most of the cases we can not get better accuracy particularly for huge dataset and dataset with several groups of data . When a classification framework considers whole dataset for training then the algorithm may become unusuable because dataset consisits of several group of data. The alternative way of making classification useable is to identify a similar group of data from the whole training data set and then training each group of similar data. In our paper, we first split the training data using k-means clustering and then train each group with Naive Bayes Classification algorithm. In addition, we saved each model to classify sample or unknown or test data. For unknown data, we classify with the best match group/model and attain higher accuracy rate than the conventional Naive Bayes classifier.

References
  1. Agarwal, R., Imielinski, T. and Swami, A. (1993) ‘Database Mining: A Performance Perspective’, IEEE: Special issue on Learning and Discovery in Knowledge-Based Databases, pp. 914-925.
  2. Agarawal, R. and R. Srikant, (1994) ‘Fast algorithms for mining association rules’, Proceedings of the 20th International Conference on Very Large Data Bases, San Francisco, CA., USA., pp: 487-499.
  3. Maindonald, J. H. (1999) ‘New approaches to using scientific data statistics, data mining and related technologies in research and research training’ Occasional Paper 98/2, The Graduate School, Australian National University.
  4. Quinlan, J. (1986), “Induction of Decision Trees,” Machine Learning, vol. 1, pp.81-106.
  5. Berson, A., Smith, S. J. and Thearling, K. (1999) Building Data Mining Applications for CRM McGraw-Hill.
  6. Zarefsky, D., (2002), Argumentation: The Study of Effective Reasoning Parts I and II, The Teaching Company 2002.
  7. Hastie T., Tibshrirani R., and Friedman J.,(2001), The Elements of Statistical Learning, Springer, New York, 2001.
  8. R. O. Duda et al, Pattern Classification, 2nd ed. Chichester, U.K.: Wiley-Interscience, 2000
  9. XHEMALI, Daniela and J. HINDE, Christopher and G. STONE ,Naive Bayes vs. Decision Trees vs. Neural Networks in the Classification of Training Web Pages, IJCSI International Journal of Computer Science Issues, Vol. 4, No. 1, 2009
  10. A. Sopharak, K. Thet Nwe, Y. Aye Moe, Matthew N. Dailey and B. Uyyanonvara, “Automatic Exudates Detection with a Naive Bayes Classifier”, Proceedings of the International Conference on Embedded Systems and Intelligent Technology, pp. 139-142, February 27-29, 2008.
  11. Domingos, P., and Pazzani, M.,(1996), ‘Beyond independence: conditions for the optimality of the simple Bayesian classifier’, Proceedings of ICML 1996.
  12. Leung, K., (2007), ‘Naive Bayesian Classifier’, Technical Report, Department of Computer Science / Finance and Risk Engineering, Polytechnic University, Brooklyn, New York, USA.
  13. Networks in the Classification of Training Web Pages, IJCSI International Journal of Computer Science Issues, Vol. 4, No. 1, 2009
  14. Matousek, ‘On Approximate Geometric k-clustering’, Discrete and Computational Geometry, vol. 24, pp. 61-84, 2000
  15. Jain, A and R.C. Dubes, Algorithms for Clustering Data. Englewood Cliffs, N.J.: Prentice Hall, 1988.
  16. Gersho, A. and R.M. Gray, Vector Quantization and Signal Compression. Boston: Kluwer Academic, 1992.
  17. Dasgupta, ‘Learning Mixtures of Gaussians’, Proc. 40th IEEE Symp. Foundations of Computer Science, pp. 634-644, Oct. 1999.
  18. Irina, R., (2001). ‘An empirical study of the naive Bayes classifier’. IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence.
Index Terms

Computer Science
Information Sciences

Keywords

Classification Naive Bayes Clustering classification accuracy