CFP last date
22 July 2024
Reseach Article

Term Importance Degree Impact on Search Result Clustering

by Soheila Karbasi, Mehdi Yaghoubi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 89 - Number 2
Year of Publication: 2014
Authors: Soheila Karbasi, Mehdi Yaghoubi

Soheila Karbasi, Mehdi Yaghoubi . Term Importance Degree Impact on Search Result Clustering. International Journal of Computer Applications. 89, 2 ( March 2014), 32-34. DOI=10.5120/15475-4164

@article{ 10.5120/15475-4164,
author = { Soheila Karbasi, Mehdi Yaghoubi },
title = { Term Importance Degree Impact on Search Result Clustering },
journal = { International Journal of Computer Applications },
issue_date = { March 2014 },
volume = { 89 },
number = { 2 },
month = { March },
year = { 2014 },
issn = { 0975-8887 },
pages = { 32-34 },
numpages = {9},
url = { },
doi = { 10.5120/15475-4164 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2024-02-06T22:08:13.318203+05:30
%A Soheila Karbasi
%A Mehdi Yaghoubi
%T Term Importance Degree Impact on Search Result Clustering
%J International Journal of Computer Applications
%@ 0975-8887
%V 89
%N 2
%P 32-34
%D 2014
%I Foundation of Computer Science (FCS), NY, USA

As wellactual clustering algorithms have to deal with explosive growth of documents of various sizes and terms of various frequencies, an appropriate term-weighting scheme has a crucial impact on the overall performance of such systems. Term-weighting is one of the critical process for document retrieval and ranking in most search result clustering systems. In this paper we introduce a new technique forclustering algorithms that solve the problem of indexing the terms of big datasets and their characteristicswhich exist in most of current clustering approaches. The paper focus on term frequency normalization step ofclustering algorithms. Anew factor has been applied tobasic term-weighting schemes for using in clustering process. The evaluated results confirm the impact of this factor to increase the performance of clusteringtechniques. The experiments were carried out on the standard algorithms and ODP-239 datasets which validated by statistical tests.

  1. H. J. Zeng, Q. C. He, Z. Chen, W. Y. Ma, and J. Ma. Learning to cluster websearch results. In Proceedings of the Special Interest Group on InformationRetrieval (SIGIR) Conference on Research and Development in InformationRetrieval, pages 210–217, Sheffield, United Kingdom, 2004. ACM Press.
  2. D. Zhang and Y. Dong. Semantic, hierarchical, online clustering of websearch results. In J. X. Yu, X. Lin, H. Lu, and Y. Zhang, editors, Asia-PacificWeb Conference, volume 3007 of Lecture Notes in Computer Science, pages69–78. Springer, 2004.
  3. C. D. Manning, P. Raghavan, and H. Schutze. Introduction to InformationRetrieval. Cambridge University Press, 2008.
  4. Salton, G. & McGill, M. J. , Introduction to Modern Information Retrieval. McGraw-Hill, New York 1983.
  5. S. Osinski, J. Stefanowski, and D. Weiss. Lingo: Search results clusteringalgorithm based on singular value decomposition. In Intelligent InformationSystems, pages 359–368, 2004.
  6. Joel W. R. et al. , TF-ICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams, ICMLA, pages 258-263. IEEE Computer Society, (2006).
  7. Salton. , G. Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley, 1989.
  8. Salton, G. & Buckley, C. , Term-Weighting Approaches in Automatic Text Retrieval, Information Processing & Management, 24(5), pp. 513-523, 1988.
  9. Salton, G. , Syntactic approaches to automatic book indexing. In Proc of the annual meeting on Association for Computational Linguistics (ACL) (1988), pages 204-210, Department of Computer Science, Cornell University, Ithaca, New York, 1988.
  10. Anh, V. & Moffat, A. , Simplified similarity scoring using term ranks, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil.
  11. Baeza-Yates, R. , &Ribeiro-Neto, B. , Modern information retrieval. Harlow, England: Addison - Wesley Longman Ltd, 1999.
  12. Robertson, S. , Walker, S. , M. M. Beaulieu, Gatford, M. & A. Payne, Okapi at trec-4. In NIST Special Publication 500-236: The Fourth Text Retrieval Conference (TREC-4), pages 73 - 96, 1995.
  13. Karbasi, S. &Yaghoubi, M. , International Journal of Computer Applications, Volume 38, January 2012.
  14. O. Zamir and O. Etzioni. Web document clustering: A feasibility demonstration. In Proceedings of the Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval, 1998.
  15. Osinski, S. , Weiss, D. , 2005. A Concept-Driven Algorithm for Clustering Search Results. IEEE Intelligent Systems, 20 (3), 48–54.
  16. A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice-Hall, Inc. , Upper Saddle River, NJ, USA, 1988.
  17. C. Carpineto and G. Romano. Odp-239 dataset. http://credo. fub. it/odp239/, 2009. Accessed on August, 19, 2011.
  18. Open directory project. http://www. dmoz. org/. Accessed on August, 19,2011.
Index Terms

Computer Science
Information Sciences


Weighted clustering Term importance degree Term frequency normalization