Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

Text Clustering Algorithms: A Review

International Journal of Computer Applications
© 2014 by IJCA Journal
Volume 96 - Number 24
Year of Publication: 2014
Himanshu Suyal
Amit Panwar
Ajit Singh Negi

Himanshu Suyal, Amit Panwar and Ajit Singh Negi. Article: Text Clustering Algorithms: A Review. International Journal of Computer Applications 96(24):36-40, June 2014. Full text available. BibTeX

	author = {Himanshu Suyal and Amit Panwar and Ajit Singh Negi},
	title = {Article: Text Clustering Algorithms: A Review},
	journal = {International Journal of Computer Applications},
	year = {2014},
	volume = {96},
	number = {24},
	pages = {36-40},
	month = {June},
	note = {Full text available}


With the growth of Internet, large amount of text data is increasing, which are created by different media like social networking sites, web, and other informatics sources, etc. This data is in unstructured format which makes it tedious to analyze it, so we need methods and algorithms which can be used with various types of text formats. Clustering is an important part of the data mining. Clustering is the process of dividing the large &similar type of text into the same class. Clustering is widely used in many applications like medical, biology, signal processing, etc. This paper briefly covers the various kinds of text clustering algorithm, present scenario of the text clustering algorithm, analysis and comparison of various aspects which contain sensitivity, stability. Algorithm contains traditional clustering like hierarchal clustering, density based clustering and self-organized map clustering.


  • Yu Hui Document cluestring based on Modified Latent Semantic analysis[j]. Journal of chinese Computer System, 2009, 30(5):963-966
  • Himanshu Suyal and R B Patel. Article: Improved Information Filtering and Feature Dimensionality Reduction using Semantic based Feature Dataset for Text Classification: In Context to Social Network. International Journal of Computer Applications 94(18):42-46, May 2014. Published by Foundation of Computer Science, New York, USA.
  • Salton G, Wong A, Yang C. A vector space model for automatic indexing[J] . Communications of the ACM, 1975, 18( 11) : 613- 620.
  • S. Murali Krishna, S. Durga Bhavani. An Efficient Approach for Text Clustering Based on Frequent Itemsets. [J]European Journal of Scientific Research. Vol. 42 No. 3 (2010), pp. 385-396
  • D. Cutting, D. Karger, J. Pedersen, J. Tukey. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. ACM SIGIR Conference, 1992.
  • H. Schutze, C. Silverstein. Projections for Efficient Document Clustering, ACM SIGIR Conference, 1997.
  • R. Bekkerman, R. El-Yaniv, Y. Winter, N. Tishby. On Feature Distributional Clustering for Text Categorization. ACM SIGIR Conference, 2001.
  • D. Gibson, J. Kleinberg, P. Raghavan. Clustering Categorical Data: An Approach Based on Dynamical Systems, VLDB Conference, 1998.
  • G. Salton, C. Buckley. Term Weighting Approaches in Automatic Text Retrieval, Information Processing and Management, 24(5), pp. 513–523, 1988.
  • Park, Hae-Sang, and Chi-Hyuck Jun. "A simple and fast algorithm for K-medoids clustering. " Expert Systems with Applications 36. 2 (2009): 3336-3341.
  • P. Andritsos, P. Tsaparas, R. Miller, K. Sevcik. LIMBO: Scalable Clustering of Categorical Data. EDBT Conference, 2004.
  • Ying Zhao; George Karypis; Usama Fayyad. Hierarchical Clustering Algorithms for Document Datasets. Data Mining and Knowledge Discovery [J]. Vol. 10, 2005. pp:141-168.
  • D. Gibson, J. Kleinberg, P. Raghavan. Clustering Categorical Data: An Approach Based on Dynamical Systems, VLDB Conference, 1998.
  • Easter M. ,kriegel H. -P. ,sander j. ,Xu. : A Density Based Algorithm for Discovering Cluster in Large Spatial data based with noise,KDD'96,pp. 226-231.
  • Xiaojun Wang, Jianwu Yang, Xiaoou Chen. An Improved K-means Document Clustering Algorithm [J] Computer Engineering, 2003, 29(2): 102-104.
  • Yang Zhanhua,Yang Yan. "Document clustering method based on hybrid of SOM and K_means". Journal of Computer application research, 2008, Vol. 18, No. 8, pp. 73-79.
  • Hinneburg, Alexander, and Daniel A. Keim. "An efficient approach to clustering in large multimedia databases with noise. " KDD. Vol. 98. 1998.