![]() |
10.5120/ijca2016909515 |
F M Kwale, P W Wagacha and A Mwaura. Article: A Text Clustering Comparison Methodology. International Journal of Computer Applications 139(13):12-19, April 2016. Published by Foundation of Computer Science (FCS), NY, USA. BibTeX
@article{key:article, author = {F.M. Kwale and P.W. Wagacha and A. Mwaura}, title = {Article: A Text Clustering Comparison Methodology}, journal = {International Journal of Computer Applications}, year = {2016}, volume = {139}, number = {13}, pages = {12-19}, month = {April}, note = {Published by Foundation of Computer Science (FCS), NY, USA} }
Abstract
Text Clustering is a problem of dividing text documents into groups, such that documents in one group are more similar than those in other groups. Although comparisons of the different algorithms have been done in an attempt to choose some over the others, such comparisons have been found to be either too limited or inadequate. In such comparisons, either the researchers (who are usually the authors of the algorithms being compared with others) did not apply a formal comparison methodology, or the comparisons were based on inadequate data, metrics and procedures.Also, the comparisons always focus on only the aspects where their algorithms are superior to the other algorithms. The few algorithms being compared with theirs obviously seem to be carefully selected such that they are the ones performing lesser than theirs on those aspects.Thus, there is still a large gap on the most suitable methodology for comparing the algorithms.
In this paper, a methodology for fairly comparing text clustering algorithms is proposed.
References
- Chen, J 2005, Comparison of Clustering Algorithms and its Application to Document Clustering, PhD Thesis, Princeton University.
- Chen, Y, Qin, B, Liu, T, Liu, Y, & Li, S 2010,‘The Comparison of SOM and K-means for Text Clustering’, International Journal of Computer and Information Science, vol. 3, no. 2.
- Prelic, A, Bleuler, S, Zimmermann, P, Wille, A, Buhlmann, P, Gruissem, W, Hennig, L, Thiele, L, &Zitzler, E 2006, ‘A systematic comparison and evaluation of biclustering methodsfor gene expression data’, Oxford University Press, vol. 22, no. 9.
- Greene, D 2007, A State-of-the-Art Toolkit for Document Clustering, PhD Thesis, University of Dublin.
- Amigo, E, Gonzalo, J, Artiles, J &Verdejo, F 2009, A comparison of Extrinsic Clustering Evaluation Metrics based on Formal Constraints, Technical Report, Departamento de Lenguajes y SistemasInformaticos, UNED, Madrid, Spain, viewed 19 January 2015, http://nlp.uned.es/docs/amigo2007a.pdf.
- Akinola, S &Oyabugbe O 2015, ‘Accuracies and Training Times of Data Mining Classsifications Algorithms: An Empirical Comparative Study’, Journal of software Engineering and Applications, vol. 8, 470-477.
- Shahzad, W 2010, Classification and Associative Classification Rule Discovery Using Ant Colony Optimization, PhD Thesis, FAST National University of Computer & Emerging Sciences.
Keywords
Clustering, Text Clustering, Metrics.