Evaluation of Clustering around Weighted Prototype and Genetic Algorithm for Document Categorization

Garima Jain; Shailendra Kumar Shrivastava

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 20 July 2026

Submit your paper

Know more

The week's pick

CAD-Genesis: An Open-Source AI-Powered Add-in for Natural Language-Driven Parametric CAD Modeling and Cross-Platform Integration in SolidWorks and Fusion 360

Anil Mandloi Prakhi Mandloi

Random Articles

Computation (Abacus) Aspects of the Sahasralingam

Jun

2016

Design and Implementation of Photo Voltaic System: Arduino Approach

August

2013

A Review of the Effective Techniques of Compression in Medical Image Processing

July

2014

Performance Comparisons of Novel Feature Vector Selection Methods for Iris Recognition

July

2012

Reseach Article

Evaluation of Clustering around Weighted Prototype and Genetic Algorithm for Document Categorization

by Garima Jain, Shailendra Kumar Shrivastava

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 125 - Number 14

Year of Publication: 2015

Authors: Garima Jain, Shailendra Kumar Shrivastava

10.5120/ijca2015906260

Garima Jain, Shailendra Kumar Shrivastava . Evaluation of Clustering around Weighted Prototype and Genetic Algorithm for Document Categorization. International Journal of Computer Applications. 125, 14 ( September 2015), 21-27. DOI=10.5120/ijca2015906260

@article{ 10.5120/ijca2015906260,

author = { Garima Jain, Shailendra Kumar Shrivastava },

title = { Evaluation of Clustering around Weighted Prototype and Genetic Algorithm for Document Categorization },

journal = { International Journal of Computer Applications },

issue_date = { September 2015 },

volume = { 125 },

number = { 14 },

month = { September },

year = { 2015 },

issn = { 0975-8887 },

pages = { 21-27 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume125/number14/22500-2015906260/ },

doi = { 10.5120/ijca2015906260 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:16:02.994452+05:30

%A Garima Jain

%A Shailendra Kumar Shrivastava

%T Evaluation of Clustering around Weighted Prototype and Genetic Algorithm for Document Categorization

%J International Journal of Computer Applications

%@ 0975-8887

%V 125

%N 14

%P 21-27

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Document clustering is very important in the field of text categorization. Genetic algorithm, which is an optimization based technique which can be applied for finding out the best cluster centres easily by computing fitness values of data points. While clustering around weighted prototype technique is especially helpful when proper pairwise similarities are available. This technique does not find global solution of the objective function. Experimental result shows that F-measure and Normalized mutual information of genetic algorithm is better than clustering around weighted prototype for 20 Newsgroup dataset. F-measure and accuracy of genetic algorithm is better than clustering around weighted prototype for the Reuter-21578 dataset.

References

F. Sebastiani, Machine learning in automated text categorization, ACM Comp. Surveys. 34 (1) (2008) 1–47
G. Salton, C. Buckley, Term-weighting approaches in automatic text retrieval, Inf. Process. Manage. (1988) 513-523
P. Turney, P. Pantel, from frequency to meaning: vector space models of semantics, J. Artif. Intell. 37 (2010)141- 188
Jun, S., Park, S.-S., & Jang, D.-S. (2014). Document clustering method using dimension reduction and support vector clustering to overcome sparseness. Expert Systems with Applications, 41, 3204–3212.
Yutaka Matsuo, Mitsuru Ishizuka “Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information,” FLAIRS 2003.
M.F. Caropreso, S. Matwin, and F. Sebastiani, “Statistical Phrases in Automated Text Categorization,” Technical Report IEI-B4-07-2000, Institution Elaborazione dell’Informazione.
S.Shehata, F. Karray, and M. Kamel, “A Concept-Based Model for Enhancing Text Categorization,” Proc. 13th Int’l Conf. Knowledge Discovery and Data Mining (KDD ’07), pp. 629-637, 2007.
Zhong, S. (2005). Efficient online spherical k-means clustering. In Proceedings of the IEEE international joint conference on neural networks (pp. 3180–3185).
Jian-Ping Mei, Lihui Chen (2014). Proximity-based k-partitions clustering with ranking for document categorization and analysis. Expert System with Applications.
T. W. Schoenharl and G. Madey, “Evaluation of measurement techniques for the validation of agent-based simulations against streaming data,” in Proc. ICCS, Kraków, Poland, 2008.
Rui Xu Donald C. Wunsch, II”Clustering” John Wiley & Sons, INC., Publication, 2009.
Deng-Yiv Chiu, Ya-Chen Pan, Topic knowledge map and knowledge structure constructions with genetic algorithm, information retrieval, and multi-dimension scaling method, Knowledge-Based System, Vol. 67,
Clustering Ensemble: A Multiobjective Genetic Algorithm based Approach, Science Direct, 2013.
Zhao, Y., & Karypis, G. (2005). Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery, 10, 141–168.
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley.
Guha, S., Rastogi, R., & Shim, K. (2001). CURE: An efﬁcient clustering algorithm for large databases. Information Systems, 26, 35–58
Bellec, J. -H., & Kechadi, M. -T. (2007). CUFRES: Clustering using fuzzy representative events selection for the fault recognition problem in telecommunication networks. In PIKM (pp. 55–62).
Halkidi, M., & Vazirgiannis, M. (2008). A density-based cluster validity approach using multi-representatives. Pattern Recognition Letters, 29, 773–786.
Mei, J.-P., & Chen, L. (2010). Fuzzy clustering with weighted medoids for relational data. Pattern Recognition, 43, 1964–1974.
J. Han and M. Kamber, Data Mining: Concepts and Techniques, 2nd ed. San Francisco, CA, USA: Morgan Kaufmann; Boston, MA, USA: Elsevier, 2006.
C. G. González, W. Bonventi, Jr., and A. L. V. Rodrigues, “Density of closed balls in real-valued and autometrized boolean spaces for clustering applications,” in Proc. 19th Brazilian Symp. Artif. Intell., Savador, Brazil, 2008, pp. 8–22.
Lang, K. (1995). NewsWeeder: Learning to ﬁlter netnews. In Proceedings of the 12th international conference on machine learning (pp. 331–339).
http://kdd.ics.uci.edu/databases/reuters21578/reuters2157 8.html.
Marina Sokolova, Guy Lapalme. A systematic analysis of performance measures for classiﬁcation tasks. Information Processing and Management 45 (2009) 427–437
Strehl, A., & Ghosh, J. (2002). Cluster ensembles – knowledge reuse framework for combining multiple partitions. Journal on Machine Learning Research, 3, 583–617
L. Liu, J. Kang, J. Yu, and Z. Wang, “A comparative study on unsupervised feature selection methods for text clustering,” in Proceedings of IEEE International Conference on Natural Language Processing and Knowledge Engineering (NLP-KE ’05), pp.597–601,November2005.
D. Zhang, S. Chen, and Z.-H. Zhou, “Constraint score: a new filter method for feature selection with pairwise constraints,” Pattern Recognition, vol.41, no.5, pp.1440–1451, 2008.
R. Feldman, J. Sanger, The text mining handbook advanced approaches in analyzing unstructured data, ABS Vent. (2006)
H. Altyncay, Z. Erenel, Analytical evaluation of term weighting schemes for text categorization, Patt. Recog. Lett. 31 (2010) 1310–1323.
M. Lan, C.L. Tan, J. Su, Y. Lu, Supervised and traditional term weighting methods for automatic text categorization, Trans. PAMI 31 (4) (2009) 721– 735
F. Debole, F. Sebastiani, Supervised term weighting for automated text categorization, in: Proceedings of the 2003 ACM Symposium on Applied Computing, SAC ’03, ACM, New York, NY, USA, 2003, pp. 784–788.
Krishnasamy, G., Kulkarni, A. J., & Paramesran, R. (2014). A hybrid approach for data clustering based on modiﬁed cohort intelligence and k-means. Expert Systems with Applications, 41, 6009–6016.
Bing Liu. Web data mining. Second Edition, Springer, 2011.
Wei Song, Soon Cheol Park, Genetic algorithm for text clustering based on latent semantic indexing, Computers and Mathematics with Applications 57 (2009) 1901_1907.

Index Terms

Computer Science

Information Sciences

Keywords

Clustering Similarity Based Genetic Algorithm Document Categorization Text mining.