A Text Clustering Comparison Methodology

F.M. Kwale; P.W. Wagacha; A. Mwaura

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Self-Training using a K-Nearest Neighbor as a Base Classifier Reinforced by Support Vector Machines

October

2012

GPU based Suffix Array Pattern Matching Approach for Big Data

Jul

2017

Open Source Vs Proprietary Application and Technologies

July

2012

Representation Learning with Adaptive Superpixel Coding

Dec

2025

Reseach Article

A Text Clustering Comparison Methodology

by F.M. Kwale, P.W. Wagacha, A. Mwaura

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 139 - Number 13

Year of Publication: 2016

Authors: F.M. Kwale, P.W. Wagacha, A. Mwaura

10.5120/ijca2016909515

F.M. Kwale, P.W. Wagacha, A. Mwaura . A Text Clustering Comparison Methodology. International Journal of Computer Applications. 139, 13 ( April 2016), 12-19. DOI=10.5120/ijca2016909515

@article{ 10.5120/ijca2016909515,

author = { F.M. Kwale, P.W. Wagacha, A. Mwaura },

title = { A Text Clustering Comparison Methodology },

journal = { International Journal of Computer Applications },

issue_date = { April 2016 },

volume = { 139 },

number = { 13 },

month = { April },

year = { 2016 },

issn = { 0975-8887 },

pages = { 12-19 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume139/number13/24550-2016909515/ },

doi = { 10.5120/ijca2016909515 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:40:51.188870+05:30

%A F.M. Kwale

%A P.W. Wagacha

%A A. Mwaura

%T A Text Clustering Comparison Methodology

%J International Journal of Computer Applications

%@ 0975-8887

%V 139

%N 13

%P 12-19

%D 2016

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Text Clustering is a problem of dividing text documents into groups, such that documents in one group are more similar than those in other groups. Although comparisons of the different algorithms have been done in an attempt to choose some over the others, such comparisons have been found to be either too limited or inadequate. In such comparisons, either the researchers (who are usually the authors of the algorithms being compared with others) did not apply a formal comparison methodology, or the comparisons were based on inadequate data, metrics and procedures.Also, the comparisons always focus on only the aspects where their algorithms are superior to the other algorithms. The few algorithms being compared with theirs obviously seem to be carefully selected such that they are the ones performing lesser than theirs on those aspects.Thus, there is still a large gap on the most suitable methodology for comparing the algorithms. In this paper, a methodology for fairly comparing text clustering algorithms is proposed.

References

Chen, J 2005, Comparison of Clustering Algorithms and its Application to Document Clustering, PhD Thesis, Princeton University.
Chen, Y, Qin, B, Liu, T, Liu, Y, & Li, S 2010,‘The Comparison of SOM and K-means for Text Clustering’, International Journal of Computer and Information Science, vol. 3, no. 2.
Prelic, A, Bleuler, S, Zimmermann, P, Wille, A, Buhlmann, P, Gruissem, W, Hennig, L, Thiele, L, &Zitzler, E 2006, ‘A systematic comparison and evaluation of biclustering methodsfor gene expression data’, Oxford University Press, vol. 22, no. 9.
Greene, D 2007, A State-of-the-Art Toolkit for Document Clustering, PhD Thesis, University of Dublin.
Amigo, E, Gonzalo, J, Artiles, J &Verdejo, F 2009, A comparison of Extrinsic Clustering Evaluation Metrics based on Formal Constraints, Technical Report, Departamento de Lenguajes y SistemasInformaticos, UNED, Madrid, Spain, viewed 19 January 2015, http://nlp.uned.es/docs/amigo2007a.pdf.
Akinola, S &Oyabugbe O 2015, ‘Accuracies and Training Times of Data Mining Classsifications Algorithms: An Empirical Comparative Study’, Journal of software Engineering and Applications, vol. 8, 470-477.
Shahzad, W 2010, Classification and Associative Classification Rule Discovery Using Ant Colony Optimization, PhD Thesis, FAST National University of Computer & Emerging Sciences.

Index Terms

Computer Science

Information Sciences

Keywords

Clustering Text Clustering Metrics.