Text Clustering Algorithms: A Review

Himanshu Suyal; Amit Panwar; Ajit Singh Negi

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

Text Clustering Algorithms: A Review

by Himanshu Suyal, Amit Panwar, Ajit Singh Negi

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 96 - Number 24

Year of Publication: 2014

Authors: Himanshu Suyal, Amit Panwar, Ajit Singh Negi

10.5120/16946-7075

Himanshu Suyal, Amit Panwar, Ajit Singh Negi . Text Clustering Algorithms: A Review. International Journal of Computer Applications. 96, 24 ( June 2014), 36-40. DOI=10.5120/16946-7075

@article{ 10.5120/16946-7075,

author = { Himanshu Suyal, Amit Panwar, Ajit Singh Negi },

title = { Text Clustering Algorithms: A Review },

journal = { International Journal of Computer Applications },

issue_date = { June 2014 },

volume = { 96 },

number = { 24 },

month = { June },

year = { 2014 },

issn = { 0975-8887 },

pages = { 36-40 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume96/number24/16946-7075/ },

doi = { 10.5120/16946-7075 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:22:42.639762+05:30

%A Himanshu Suyal

%A Amit Panwar

%A Ajit Singh Negi

%T Text Clustering Algorithms: A Review

%J International Journal of Computer Applications

%@ 0975-8887

%V 96

%N 24

%P 36-40

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

With the growth of Internet, large amount of text data is increasing, which are created by different media like social networking sites, web, and other informatics sources, etc. This data is in unstructured format which makes it tedious to analyze it, so we need methods and algorithms which can be used with various types of text formats. Clustering is an important part of the data mining. Clustering is the process of dividing the large &similar type of text into the same class. Clustering is widely used in many applications like medical, biology, signal processing, etc. This paper briefly covers the various kinds of text clustering algorithm, present scenario of the text clustering algorithm, analysis and comparison of various aspects which contain sensitivity, stability. Algorithm contains traditional clustering like hierarchal clustering, density based clustering and self-organized map clustering.

References

Yu Hui Document cluestring based on Modified Latent Semantic analysis[j]. Journal of chinese Computer System, 2009, 30(5):963-966
Himanshu Suyal and R B Patel. Article: Improved Information Filtering and Feature Dimensionality Reduction using Semantic based Feature Dataset for Text Classification: In Context to Social Network. International Journal of Computer Applications 94(18):42-46, May 2014. Published by Foundation of Computer Science, New York, USA.
Salton G, Wong A, Yang C. A vector space model for automatic indexing[J] . Communications of the ACM, 1975, 18( 11) : 613- 620.
S. Murali Krishna, S. Durga Bhavani. An Efficient Approach for Text Clustering Based on Frequent Itemsets. [J]European Journal of Scientific Research. Vol. 42 No. 3 (2010), pp. 385-396
D. Cutting, D. Karger, J. Pedersen, J. Tukey. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. ACM SIGIR Conference, 1992.
H. Schutze, C. Silverstein. Projections for Efficient Document Clustering, ACM SIGIR Conference, 1997.
R. Bekkerman, R. El-Yaniv, Y. Winter, N. Tishby. On Feature Distributional Clustering for Text Categorization. ACM SIGIR Conference, 2001.
D. Gibson, J. Kleinberg, P. Raghavan. Clustering Categorical Data: An Approach Based on Dynamical Systems, VLDB Conference, 1998.
G. Salton, C. Buckley. Term Weighting Approaches in Automatic Text Retrieval, Information Processing and Management, 24(5), pp. 513–523, 1988.
Park, Hae-Sang, and Chi-Hyuck Jun. "A simple and fast algorithm for K-medoids clustering. " Expert Systems with Applications 36. 2 (2009): 3336-3341.
P. Andritsos, P. Tsaparas, R. Miller, K. Sevcik. LIMBO: Scalable Clustering of Categorical Data. EDBT Conference, 2004.
Ying Zhao; George Karypis; Usama Fayyad. Hierarchical Clustering Algorithms for Document Datasets. Data Mining and Knowledge Discovery [J]. Vol. 10, 2005. pp:141-168.
D. Gibson, J. Kleinberg, P. Raghavan. Clustering Categorical Data: An Approach Based on Dynamical Systems, VLDB Conference, 1998.
Easter M. ,kriegel H. -P. ,sander j. ,Xu. : A Density Based Algorithm for Discovering Cluster in Large Spatial data based with noise,KDD'96,pp. 226-231.
Xiaojun Wang, Jianwu Yang, Xiaoou Chen. An Improved K-means Document Clustering Algorithm [J] Computer Engineering, 2003, 29(2): 102-104.
Yang Zhanhua,Yang Yan. "Document clustering method based on hybrid of SOM and K_means". Journal of Computer application research, 2008, Vol. 18, No. 8, pp. 73-79.
Hinneburg, Alexander, and Daniel A. Keim. "An efficient approach to clustering in large multimedia databases with noise. " KDD. Vol. 98. 1998.

Index Terms

Computer Science

Information Sciences

Keywords

Data mining K mean clustering text cluster Hierarchal clustering prototype Density bases clustering