CFP last date
20 May 2024
Reseach Article

Phrase based Clustering Scheme of Suffix Tree Document Clustering Model

by Anoop Kumar Jain, Satyam Maheshwari
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 63 - Number 10
Year of Publication: 2013
Authors: Anoop Kumar Jain, Satyam Maheshwari
10.5120/10504-5273

Anoop Kumar Jain, Satyam Maheshwari . Phrase based Clustering Scheme of Suffix Tree Document Clustering Model. International Journal of Computer Applications. 63, 10 ( February 2013), 30-37. DOI=10.5120/10504-5273

@article{ 10.5120/10504-5273,
author = { Anoop Kumar Jain, Satyam Maheshwari },
title = { Phrase based Clustering Scheme of Suffix Tree Document Clustering Model },
journal = { International Journal of Computer Applications },
issue_date = { February 2013 },
volume = { 63 },
number = { 10 },
month = { February },
year = { 2013 },
issn = { 0975-8887 },
pages = { 30-37 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume63/number10/10504-5273/ },
doi = { 10.5120/10504-5273 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:14:01.647866+05:30
%A Anoop Kumar Jain
%A Satyam Maheshwari
%T Phrase based Clustering Scheme of Suffix Tree Document Clustering Model
%J International Journal of Computer Applications
%@ 0975-8887
%V 63
%N 10
%P 30-37
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Document clustering is one of the difficult and recent research fields in the search engine research. Most of the existing documents clustering techniques use a group of keywords from each document to cluster the documents. Document clustering arises from information retrieval domains, and "It finds grouping for a set of documents belonging to the same cluster are similar and documents belongs to the different cluster are dissimilar". The nformation retrieval plays an important role in data mining for extracting the relevant information for related to user request. Information retrieval finds the file contents and identifies their similarity. It measures the performance of the documents by using the precision and recall. In this paper we proposed a phrase based clustering scheme which based on application of Suffix Tree Document Clustering (STDC) model. The proposed algorithm is designed to use the STDC model for accurate equivalent representation of document and similarity measurement of the similar documents. This method of clustering reduces the grouping time and similarity accuracy as compared to other existing methods.

References
  1. Shafiq Alam, Gillian Dobbie, Patricia Riddle, M. Asif Naeem, "Particle Swarm Optimization Based Hierarchical Agglomerative Clustering", 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pp. 64-68.
  2. David Pettinger and Giuseppe Di Fatta, "Scalability of Efficient Parallel K-Means", IEEE e-Science 2009 Workshops, pp. 96-101.
  3. Yun Ling and Hangzhou, "Fast Co-clustering Using Matrix Decomposition", IEEE 2009 Asia-Pacific Conference on Information Processing, pp. 201-204.
  4. J. Prabhu and M. Sudharshan and M. Saravanan and G. Prasad, "Augmenting Rapid Clustering Method for Social Network Analysis", 2010 International Conference on Advances in Social Networks Analysis and Mining, pp. 407-408.
  5. F. Yang, T. Sun, C. Zhang, An efficient hybrid data clustering method based on K-harmonic means, and Particle Swarm Optimization, Expert Systems with Applications 2009, pp. 9847–9852.
  6. Y. -T. Kao, E. Zahara, I. -W. Kao, A hybridized approach to data clustering, Expert Systems with Applications 2008, pp. 1754-1762.
  7. Madjid Khalilian, Farsad Zamani Boroujeni, Norwati Mustapha, Md. Nasir Sulaiman, "K-Means Divide and Conquer Clustering", IEEE 2009, International Conference on Computer and Automation Engineering, pp. 306-309.
  8. Lan Yu, "Applying Clustering to Data Analysis of Physical Healthy Standard", 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2010), pp. 2766-2768.
  9. Vignesh T. Ravi and Gagan Agrawal, "Performance Issues in Parallelizing Data-Intensive Applications on a Multi-core Cluster", 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 308-315.
  10. Maryam hajiee, "A New Distributed Clustering Algorithm Based on K-means Algorithm", 2010 3rd International Conforence on Advanced Computer Theory and Engineering (1CACTE), pp. 408-411 (V2).
Index Terms

Computer Science
Information Sciences

Keywords

Clustering Techniques Document Clustering Phrase Merging Suffix Tree