CFP last date
20 May 2024
Reseach Article

Text Classification by Enhancing Weights of Terms based on their Positional Appearances

by Anagha R Kulkarni, Vrinda Tokekar, Parag Kulkarni
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 78 - Number 9
Year of Publication: 2013
Authors: Anagha R Kulkarni, Vrinda Tokekar, Parag Kulkarni
10.5120/13518-1298

Anagha R Kulkarni, Vrinda Tokekar, Parag Kulkarni . Text Classification by Enhancing Weights of Terms based on their Positional Appearances. International Journal of Computer Applications. 78, 9 ( September 2013), 23-26. DOI=10.5120/13518-1298

@article{ 10.5120/13518-1298,
author = { Anagha R Kulkarni, Vrinda Tokekar, Parag Kulkarni },
title = { Text Classification by Enhancing Weights of Terms based on their Positional Appearances },
journal = { International Journal of Computer Applications },
issue_date = { September 2013 },
volume = { 78 },
number = { 9 },
month = { September },
year = { 2013 },
issn = { 0975-8887 },
pages = { 23-26 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume78/number9/13518-1298/ },
doi = { 10.5120/13518-1298 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:51:17.353533+05:30
%A Anagha R Kulkarni
%A Vrinda Tokekar
%A Parag Kulkarni
%T Text Classification by Enhancing Weights of Terms based on their Positional Appearances
%J International Journal of Computer Applications
%@ 0975-8887
%V 78
%N 9
%P 23-26
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Huge store of hidden information in text documents is available. Extracting accurate, useful information from this store is very important. Multinomial Naïve Bayes classification algorithm is effective in processing text and extracting accurate information. A new approach of assigning weights to terms based on their positional appearance is proposed. The effectiveness of this approach is demonstrated for two standard text datasets Reuters-21578 and 20-newsgroups. This proposed approach improves average F-measure by 1. 0% for Reuters-21578 and by 2% for 20-newsgroups at least.

References
  1. David D. Lewis. Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. Proc. of the 10th European Conference on Machine Learning 1998, pp. 4-15.
  2. A. McCallum and K. Nigam. A comparison of event models for naïve Bayes text classification. Proc of AAAI, 1998.
  3. JDM Rennie, L. Shih, J. Teevan and D. R. Karger. Tackling the poor Assumption of Naïve Bayes Text Classifiers. Proc of the twelfth Intl Conf on Machine Learning (ICML) 2003.
  4. Y. Ko. A study of term weighting schemes using class information for text classification. Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval 2012, pp. 1029–1030.
  5. M. Mendoza. A new term-weighting scheme for naïve Bayes text categorization. International Journal of Web Information Systems, Vol. 8 Issue 1 2012, pp. 55 – 72.
  6. C. J. van Rijsbergen. Information Retrieval. Butterworth, 1990.
  7. Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann and Ian H. Witten. The WEKA Data Mining Software: An Update. SIGKDD Explorations, Volume 11, Issue 1, 2009.
  8. R. Bekkerman and J. Allan. Using bigrams in text categorization. Department of Computer Science, University of Massachusetts, Amherst 1003 (2004).
Index Terms

Computer Science
Information Sciences

Keywords

Term Weighting Document Classification Multinomial Naïve Bayes Classification Algorithm