CFP last date
22 April 2024
Reseach Article

Text Clustering Algorithms: A Review

by Himanshu Suyal, Amit Panwar, Ajit Singh Negi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 96 - Number 24
Year of Publication: 2014
Authors: Himanshu Suyal, Amit Panwar, Ajit Singh Negi
10.5120/16946-7075

Himanshu Suyal, Amit Panwar, Ajit Singh Negi . Text Clustering Algorithms: A Review. International Journal of Computer Applications. 96, 24 ( June 2014), 36-40. DOI=10.5120/16946-7075

@article{ 10.5120/16946-7075,
author = { Himanshu Suyal, Amit Panwar, Ajit Singh Negi },
title = { Text Clustering Algorithms: A Review },
journal = { International Journal of Computer Applications },
issue_date = { June 2014 },
volume = { 96 },
number = { 24 },
month = { June },
year = { 2014 },
issn = { 0975-8887 },
pages = { 36-40 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume96/number24/16946-7075/ },
doi = { 10.5120/16946-7075 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:22:42.639762+05:30
%A Himanshu Suyal
%A Amit Panwar
%A Ajit Singh Negi
%T Text Clustering Algorithms: A Review
%J International Journal of Computer Applications
%@ 0975-8887
%V 96
%N 24
%P 36-40
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

With the growth of Internet, large amount of text data is increasing, which are created by different media like social networking sites, web, and other informatics sources, etc. This data is in unstructured format which makes it tedious to analyze it, so we need methods and algorithms which can be used with various types of text formats. Clustering is an important part of the data mining. Clustering is the process of dividing the large &similar type of text into the same class. Clustering is widely used in many applications like medical, biology, signal processing, etc. This paper briefly covers the various kinds of text clustering algorithm, present scenario of the text clustering algorithm, analysis and comparison of various aspects which contain sensitivity, stability. Algorithm contains traditional clustering like hierarchal clustering, density based clustering and self-organized map clustering.

References
  1. Yu Hui Document cluestring based on Modified Latent Semantic analysis[j]. Journal of chinese Computer System, 2009, 30(5):963-966
  2. Himanshu Suyal and R B Patel. Article: Improved Information Filtering and Feature Dimensionality Reduction using Semantic based Feature Dataset for Text Classification: In Context to Social Network. International Journal of Computer Applications 94(18):42-46, May 2014. Published by Foundation of Computer Science, New York, USA.
  3. Salton G, Wong A, Yang C. A vector space model for automatic indexing[J] . Communications of the ACM, 1975, 18( 11) : 613- 620.
  4. S. Murali Krishna, S. Durga Bhavani. An Efficient Approach for Text Clustering Based on Frequent Itemsets. [J]European Journal of Scientific Research. Vol. 42 No. 3 (2010), pp. 385-396
  5. D. Cutting, D. Karger, J. Pedersen, J. Tukey. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. ACM SIGIR Conference, 1992.
  6. H. Schutze, C. Silverstein. Projections for Efficient Document Clustering, ACM SIGIR Conference, 1997.
  7. R. Bekkerman, R. El-Yaniv, Y. Winter, N. Tishby. On Feature Distributional Clustering for Text Categorization. ACM SIGIR Conference, 2001.
  8. D. Gibson, J. Kleinberg, P. Raghavan. Clustering Categorical Data: An Approach Based on Dynamical Systems, VLDB Conference, 1998.
  9. G. Salton, C. Buckley. Term Weighting Approaches in Automatic Text Retrieval, Information Processing and Management, 24(5), pp. 513–523, 1988.
  10. Park, Hae-Sang, and Chi-Hyuck Jun. "A simple and fast algorithm for K-medoids clustering. " Expert Systems with Applications 36. 2 (2009): 3336-3341.
  11. P. Andritsos, P. Tsaparas, R. Miller, K. Sevcik. LIMBO: Scalable Clustering of Categorical Data. EDBT Conference, 2004.
  12. Ying Zhao; George Karypis; Usama Fayyad. Hierarchical Clustering Algorithms for Document Datasets. Data Mining and Knowledge Discovery [J]. Vol. 10, 2005. pp:141-168.
  13. D. Gibson, J. Kleinberg, P. Raghavan. Clustering Categorical Data: An Approach Based on Dynamical Systems, VLDB Conference, 1998.
  14. Easter M. ,kriegel H. -P. ,sander j. ,Xu. : A Density Based Algorithm for Discovering Cluster in Large Spatial data based with noise,KDD'96,pp. 226-231.
  15. Xiaojun Wang, Jianwu Yang, Xiaoou Chen. An Improved K-means Document Clustering Algorithm [J] Computer Engineering, 2003, 29(2): 102-104.
  16. Yang Zhanhua,Yang Yan. "Document clustering method based on hybrid of SOM and K_means". Journal of Computer application research, 2008, Vol. 18, No. 8, pp. 73-79.
  17. Hinneburg, Alexander, and Daniel A. Keim. "An efficient approach to clustering in large multimedia databases with noise. " KDD. Vol. 98. 1998.
Index Terms

Computer Science
Information Sciences

Keywords

Data mining K mean clustering text cluster Hierarchal clustering prototype Density bases clustering