Call for Paper - November 2023 Edition
IJCA solicits original research papers for the November 2023 Edition. Last date of manuscript submission is October 20, 2023. Read More

A Text Clustering Comparison Methodology

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2016
F.M. Kwale, P.W. Wagacha, A. Mwaura

F M Kwale, P W Wagacha and A Mwaura. Article: A Text Clustering Comparison Methodology. International Journal of Computer Applications 139(13):12-19, April 2016. Published by Foundation of Computer Science (FCS), NY, USA. BibTeX

	author = {F.M. Kwale and P.W. Wagacha and A. Mwaura},
	title = {Article: A Text Clustering Comparison Methodology},
	journal = {International Journal of Computer Applications},
	year = {2016},
	volume = {139},
	number = {13},
	pages = {12-19},
	month = {April},
	note = {Published by Foundation of Computer Science (FCS), NY, USA}


Text Clustering is a problem of dividing text documents into groups, such that documents in one group are more similar than those in other groups. Although comparisons of the different algorithms have been done in an attempt to choose some over the others, such comparisons have been found to be either too limited or inadequate. In such comparisons, either the researchers (who are usually the authors of the algorithms being compared with others) did not apply a formal comparison methodology, or the comparisons were based on inadequate data, metrics and procedures.Also, the comparisons always focus on only the aspects where their algorithms are superior to the other algorithms. The few algorithms being compared with theirs obviously seem to be carefully selected such that they are the ones performing lesser than theirs on those aspects.Thus, there is still a large gap on the most suitable methodology for comparing the algorithms.

In this paper, a methodology for fairly comparing text clustering algorithms is proposed.


  1. Chen, J 2005, Comparison of Clustering Algorithms and its Application to Document Clustering, PhD Thesis, Princeton University.
  2. Chen, Y, Qin, B, Liu, T, Liu, Y, & Li, S 2010,‘The Comparison of SOM and K-means for Text Clustering’, International Journal of Computer and Information Science, vol. 3, no. 2.
  3. Prelic, A, Bleuler, S, Zimmermann, P, Wille, A, Buhlmann, P, Gruissem, W, Hennig, L, Thiele, L, &Zitzler, E 2006, ‘A systematic comparison and evaluation of biclustering methodsfor gene expression data’, Oxford University Press, vol. 22, no. 9.
  4. Greene, D 2007, A State-of-the-Art Toolkit for Document Clustering, PhD Thesis, University of Dublin.
  5. Amigo, E, Gonzalo, J, Artiles, J &Verdejo, F 2009, A comparison of Extrinsic Clustering Evaluation Metrics based on Formal Constraints, Technical Report, Departamento de Lenguajes y SistemasInformaticos, UNED, Madrid, Spain, viewed 19 January 2015,
  6. Akinola, S &Oyabugbe O 2015, ‘Accuracies and Training Times of Data Mining Classsifications Algorithms: An Empirical Comparative Study’, Journal of software Engineering and Applications, vol. 8, 470-477.
  7. Shahzad, W 2010, Classification and Associative Classification Rule Discovery Using Ant Colony Optimization, PhD Thesis, FAST National University of Computer & Emerging Sciences.


Clustering, Text Clustering, Metrics.