CFP last date
22 April 2024
Call for Paper
May Edition
IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 22 April 2024

Submit your paper
Know more
Reseach Article

Semantic Suffix Net Clustering for Search Results

by Jongkol Janruang, Sumanta Guha
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 59 - Number 7
Year of Publication: 2012
Authors: Jongkol Janruang, Sumanta Guha
10.5120/9557-4017

Jongkol Janruang, Sumanta Guha . Semantic Suffix Net Clustering for Search Results. International Journal of Computer Applications. 59, 7 ( December 2012), 1-8. DOI=10.5120/9557-4017

@article{ 10.5120/9557-4017,
author = { Jongkol Janruang, Sumanta Guha },
title = { Semantic Suffix Net Clustering for Search Results },
journal = { International Journal of Computer Applications },
issue_date = { December 2012 },
volume = { 59 },
number = { 7 },
month = { December },
year = { 2012 },
issn = { 0975-8887 },
pages = { 1-8 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume59/number7/9557-4017/ },
doi = { 10.5120/9557-4017 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:03:31.172233+05:30
%A Jongkol Janruang
%A Sumanta Guha
%T Semantic Suffix Net Clustering for Search Results
%J International Journal of Computer Applications
%@ 0975-8887
%V 59
%N 7
%P 1-8
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Suffix Tree Clustering (STC) uses the suffix tree structure to find a set of snippets that share a common phrase and uses this information to propose clusters. As a result, STC is a fast incremental algorithm for automatic clustering and labeling but it cannot cluster semantically similar snippets. However, the meaning of the words is indeed an important property that relates them to other words, although there may not be a match of text strings per se. In this paper, we propose a new semantic search results clustering algorithm, called semantic suffix net clustering (SSNC). It is based on semantic suffix net structure (SSN). The proposed algorithm uses the net pruning technique to merge the related suffixes through their suffix links for finding base clusters. This logic causes both string matching and meaning of the words to be used as conditions for the purpose of clustering. Experimental results show that the proposed algorithm has time complexity lower than CFWMS, SSTC and STC+GSSN which are current semantic search results clustering methods. Moreover, the F-measure of the proposed algorithm is similar to that of the original STC, CFWMS, STC+GSSN, and higher than that of MSRC and SSTC.

References
  1. Jiawei, H. and Micheline, K. "Data Mining: Concepts and Techniques". Morgan Kaufmann, 2006, in press.
  2. Yanjun, L. , Soon, M. C. , and John, D. H. 2008. Text document clustering based on frequent word meaning sequences. Journal Data & Knowledge Engineering. 64, 381-404.
  3. Oren, Z. and Oren, E. 1998. Web Document Clustering: Feasibility Demonstration. In Proceeding of SIGIR'98.
  4. Dell, Z. and Yisheng, D. 2004. Semantic, Hierarchical, Online Clustering of Web Search Results. In Proceeding of APWeb.
  5. Stanislaw, O. and Dawid, W. 2005. A Concept-Driven Algorithm for Clustering Search Results. IEEE Intelligent Systems, 20(3), 48-54.
  6. Paolo, F. and Antonio, G. 2005. A Personalized Search Engine based on Web-Snippet Hierarchical Clustering. In proceeding of WWW.
  7. Claudio, C. , Stanislaw, O. and Dawid, W. 2009. A Survey of Web Clustering Engines. ACM Computing Surveys (CSUR), 41(3), 1-38.
  8. Claudio, C. and Giovanni, R. 2004. Exploiting the Potential of Concept Lattices for Information Retrieval with CREDO. Journal of Universal Computer Science, 10(8), 985-1013.
  9. Hung, C. and Xiaotie, D. 2007. A New Suffix Tree Similarity Measure for Document Clustering. In Proceeding of WWW.
  10. Illhoi, Y. , Xiaohua, H. and Il-Yeol, S. 2007. A Coherent Graph-Based Semantic Clustering and Summarization Approach For Biomedical Literature and A New Summarization Evaluation Method. BMC bioinformatics.
  11. Xian-Jun, M. , Qing-Cai, C. and Xiao-Long W. 2009. A Tolerance Rough Set Based Semantic Clustering Method for Web Search Results. Information Technology Journal, 8(4), 453-464.
  12. Janruang, J. and Guha, S. 2011. Semantic Suffix Tree Clustering. In Proceedings of DEIT.
  13. Janruang, J. and Guha, S. 2011. Applying Semantic Suffix Net to Suffix Tree Clustering. In Proceeding of DMO.
  14. Zeng, H. , He, Q. , Chen, Z. , Ma, W. and Ma, J. 2004. Learning to cluster web search results. In Proceeding of SIGIR'04.
  15. Janruang, J. and Kreesuradej, W. 2006. A New Web Search Result Clustering based on True Common Phrase Label Discovery. In Proceeding of CIMCA.
  16. Wang, J. , Mo, Y. , Huang, B. , Wen, J. and He, L. 2008. Web Search Results Clustering Based on a Novel Suffix Tree Structure. In Proceeding of ATC'08.
  17. Open Directory Project. 2012. http://www. dmoz. com
Index Terms

Computer Science
Information Sciences

Keywords

search results clustering semantic suffix net net pruning techniques semantic suffix net clustering semantic clustering