Call for Paper - September 2020 Edition
IJCA solicits original research papers for the September 2020 Edition. Last date of manuscript submission is August 20, 2020. Read More

Data Mining: Document Classification using Naive Bayes Classifier

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2017
Authors:
Ekta Jadon, Roopesh Sharma
10.5120/ijca2017913925

Ekta Jadon and Roopesh Sharma. Data Mining: Document Classification using Naive Bayes Classifier. International Journal of Computer Applications 167(6):13-16, June 2017. BibTeX

@article{10.5120/ijca2017913925,
	author = {Ekta Jadon and Roopesh Sharma},
	title = {Data Mining: Document Classification using Naive Bayes Classifier},
	journal = {International Journal of Computer Applications},
	issue_date = {June 2017},
	volume = {167},
	number = {6},
	month = {Jun},
	year = {2017},
	issn = {0975-8887},
	pages = {13-16},
	numpages = {4},
	url = {http://www.ijcaonline.org/archives/volume167/number6/27774-2017913925},
	doi = {10.5120/ijca2017913925},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

In data mining, classification is the way to splits the data into several dependent and independent regions and each region refer as a class. There are different kinds of classifier uses to accomplish classification task. Moreover classification is bounded in case of classifying of text documents. The motives of the work which a present in the article is to evaluate multiclass document classification and to learn achieve accuracy of classification in the case of text documents. Naive Bayes approach is used to deal with the problem of document classification via a deceptively simplistic model. The Naive Bayes approach is applied in Flat (linear) and hierarchical manner for improving the efficiency of classification model. It has been found that Hierarchical Classification technique is more effective than Flat classification. It also performs better in case of multi-label document classification. In contrast to retrospect we observe significant increase in the generation of data each day. And hence with the advent of smarter technologies, data is required to be classified and sorted before framing out decisions from it. There are so many techniques available for classifying documents into various categories or labels. Data mining is the process of non-trivial extraction of novel, implicit, and actionable knowledge from large data sets.

References

  1. Shweta Joshi. "Categorizing the Document Using Multi Class Classification in Data Mining", 2011 International Conference on Computational Intelligence and Communication Networks, 10/2011Ding, W. and Marchionini, G. 1997 A Study on Video Browsing Strategies. Technical Report. University of Maryland at College Park.
  2. Nigam, Ayan, et al. "Classifying the bugs using multi-class semi supervised support vector machine." Pattern Recognition, Informatics and Medical Engineering (PRIME), 2012 International Conference on. IEEE, 2012.
  3. Ponce, Julio, Alberto Hernndez, Alberto Ochoa, Felipe Padilla, Alejandro Padilla, Francisco lvarez, and Eunice Ponce de Le. "Data Mining in Web Applications", Data Mining and Knowledge Discovery in Real Life Applications, 2009.
  4. Survey of Classification Techniques in Data Mining, Thair Nu Phyu, Proceedings of the International Multi Conference of Engineers and Computer Scientists, 2009, Vol. IIMECS 2009, March 18 - 20, 2009, Hong Kong.
  5. Alexandrin Popescul, Lyle H. Ungar, Steve Lawrence, David M. Pennock, Statistical relational learning for document mining. In Proceedings of IEEE International Conference on Data Mining (ICDM-2003), 2003, pages 275–282.
  6. S. B. Kim, H. C. Rim, D. S. Yook, H. S. Lim, Effective Methods for Improving Naïve Bayes Text Classifiers, In Proceeding of the 7th Pacific Rim International Conference on Artificial Intelligence, 2002, Volume, 2417.
  7. Yang Y., Liu X., A re-examination of text categorization methods. Proceedings of the 22nd Annual International Conference on Research and Development in Information Retrieval (SIGIR’99), 1999, pp. 42-49, ACM Press.
  8. Nigam, B., Ahirwal, P., Salve, S., & Vamney, S. (2011). Document classification using expectation maximization with semi supervised learning. arXiv preprint arXiv:1112.2028.
  9. Senkamalavalli, R, and T Bhuvaneshwari. "Data mining techniques for CRM", International Conference on Information Communication and Embedded Systems (ICICES2014), 2014.
  10. Jain, Rishabh, et al. "Performance evaluation of PSVM using various combination of kernel function for intrusion detection system." International Journal of Modeling and Optimization 2.5 (2012): 613.

Keywords

Data Mining, Mining Techniques, Classification, Document Classification, Naïve Bayes Classifier.