A Comparative Analysis of Various Classifications in Vector Space Model with Absolute Pruning

Nandni Patel; Santosh Vishwakarma

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

A Comparative Analysis of Various Classifications in Vector Space Model with Absolute Pruning

by Nandni Patel, Santosh Vishwakarma

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 172 - Number 8

Year of Publication: 2017

Authors: Nandni Patel, Santosh Vishwakarma

10.5120/ijca2017915199

Nandni Patel, Santosh Vishwakarma . A Comparative Analysis of Various Classifications in Vector Space Model with Absolute Pruning. International Journal of Computer Applications. 172, 8 ( Aug 2017), 34-38. DOI=10.5120/ijca2017915199

@article{ 10.5120/ijca2017915199,

author = { Nandni Patel, Santosh Vishwakarma },

title = { A Comparative Analysis of Various Classifications in Vector Space Model with Absolute Pruning },

journal = { International Journal of Computer Applications },

issue_date = { Aug 2017 },

volume = { 172 },

number = { 8 },

month = { Aug },

year = { 2017 },

issn = { 0975-8887 },

pages = { 34-38 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume172/number8/28274-2017915199/ },

doi = { 10.5120/ijca2017915199 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:19:49.903812+05:30

%A Nandni Patel

%A Santosh Vishwakarma

%T A Comparative Analysis of Various Classifications in Vector Space Model with Absolute Pruning

%J International Journal of Computer Applications

%@ 0975-8887

%V 172

%N 8

%P 34-38

%D 2017

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Text Classification is an important problem in text mining used to categorize an undefined label. In this work, various classification models have been evaluated after pre-processing of the text dataset. The pre-processing steps include tokenization, stop word removal and stemming, after which different term weight scheme have also been implemented. Various pruning techniques have also been implemented to get the maximum count of the terms. Based on this analysis, we summarized that Naïve Bayes method gives the highest accuracy while comparing with other state of the art text classifiers.

References

Zhai, Chengxiang, and John Lafferty. "A study of smoothing methods for language models applied to ad hoc information retrieval." ACM SIGIR Forum. Vol. 51. No. 2. ACM, 2017.
Beel, Joeran, Stefan Langer, and Bela Gipp. "TF-IDuF: A Novel Term-Weighting Scheme for User Modeling based on Users’ Personal Document Collections." Proceedings of the 12th Conference. 2017.
Deng, Zhi-Hong, Kun-Hu Luo, and Hong-Liang Yu. "A study of supervised term weighting scheme for sentiment analysis." Expert Systems with Applications 41.7 (2014): 3506-3513.
Frei, Hans-Peter. "Information retrieval-from academic research to practical applications." In: Proceedings of the 5th Annual Symposium on Document Analysis and Information Retrieval, Las Vegas. 1996.
Cummins, Ronan, and Colm O'Riordan. "An evaluation of evolved term-weighting schemes in information retrieval." Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, 2005
Cummins, Ronan, and Colm O’Riordan. "Determining general term weighting schemes for the vector space model of information retrieval using genetic programming." 15th Artificial Intelligence and Cognitive Science Conference (AICS 2004). 2004.
Jin, Rong, Joyce Y. Chai, and Luo Si. "Learn to weight terms in information retrieval using category information." Proceedings of the 22nd international conference on Machine learning. ACM, 2005.
Reed, Joel W., et al. "TF-ICF: A new term weighting scheme for clustering dynamic data streams." Machine Learning and Applications, 2006.
Ljiljana Dolamic & Jacques Savoy UniNE at FIRE 2010: Hindi, Bengali, and Marathi IR
Paul McNamee and James Mayfield, Character N-gram Tokenization for European Language Text Retrieval. Information Retrieval, 7:73-97, 2004.
Mierswart al, “Rapid prototyping for complex data mining tasks”, In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 935–940. ACM, 2006.
Land Sebastian and Fisher Simon,”RapidMiner in academic use”, 2012 www.rapid-i.com.
Mierswa, I. et al “YALE: Rapid Prototyping for Complex Data Mining tasks”, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD-06), pp. 935-940, 2006.
Paolo Palmerini, "On performance of data mining: from algorithms to management systems for data exploration", Technical Report, Universit`a Ca’ Foscari di Venezia, 2004.
Monolingual Information Retrieval using Terrier: FIRE 2010 Experiments based on n-gram indexing Vishwakarma Santosh K., et al. "Monolingual Information Retrieval using Terrier: FIRE 2010 Experiments based on n-gram indexing." Procedia Computer Science 57 (2015): 815-820.
"Text mining: The state of the art and the challenges." Proceedings of the PAKDD 1999 Workshop on Knowledge Discovery from Advanced Databases. Vol. 8, 1999.

Index Terms

Computer Science

Information Sciences

Keywords

Text Classification Models Pruning Methods Vector Space Model Absolute Pruning