Term Importance Degree Impact on Search Result Clustering

Soheila Karbasi; Mehdi Yaghoubi

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2026

Submit your paper

Know more

The week's pick

AI-Assisted Observability in Distributed Microservice Architectures

Kyrylo Sotnykov

Random Articles

An Evaluation of Network Topologies for Enhance Networking

Jun

2023

Semantic Web Application in Learning Resource Ontology Repository

April

2016

FRANSAC: Fast RANdom Sample Consensus for 3D Plane Segmentation

Jun

2017

Recommender Systems for Software Requirements Negotiation and Prioritization

May

2015

Reseach Article

Term Importance Degree Impact on Search Result Clustering

by Soheila Karbasi, Mehdi Yaghoubi

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 89 - Number 2

Year of Publication: 2014

Authors: Soheila Karbasi, Mehdi Yaghoubi

10.5120/15475-4164

Soheila Karbasi, Mehdi Yaghoubi . Term Importance Degree Impact on Search Result Clustering. International Journal of Computer Applications. 89, 2 ( March 2014), 32-34. DOI=10.5120/15475-4164

@article{ 10.5120/15475-4164,

author = { Soheila Karbasi, Mehdi Yaghoubi },

title = { Term Importance Degree Impact on Search Result Clustering },

journal = { International Journal of Computer Applications },

issue_date = { March 2014 },

volume = { 89 },

number = { 2 },

month = { March },

year = { 2014 },

issn = { 0975-8887 },

pages = { 32-34 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume89/number2/15475-4164/ },

doi = { 10.5120/15475-4164 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:08:13.318203+05:30

%A Soheila Karbasi

%A Mehdi Yaghoubi

%T Term Importance Degree Impact on Search Result Clustering

%J International Journal of Computer Applications

%@ 0975-8887

%V 89

%N 2

%P 32-34

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

As wellactual clustering algorithms have to deal with explosive growth of documents of various sizes and terms of various frequencies, an appropriate term-weighting scheme has a crucial impact on the overall performance of such systems. Term-weighting is one of the critical process for document retrieval and ranking in most search result clustering systems. In this paper we introduce a new technique forclustering algorithms that solve the problem of indexing the terms of big datasets and their characteristicswhich exist in most of current clustering approaches. The paper focus on term frequency normalization step ofclustering algorithms. Anew factor has been applied tobasic term-weighting schemes for using in clustering process. The evaluated results confirm the impact of this factor to increase the performance of clusteringtechniques. The experiments were carried out on the standard algorithms and ODP-239 datasets which validated by statistical tests.

References

H. J. Zeng, Q. C. He, Z. Chen, W. Y. Ma, and J. Ma. Learning to cluster websearch results. In Proceedings of the Special Interest Group on InformationRetrieval (SIGIR) Conference on Research and Development in InformationRetrieval, pages 210–217, Sheffield, United Kingdom, 2004. ACM Press.
D. Zhang and Y. Dong. Semantic, hierarchical, online clustering of websearch results. In J. X. Yu, X. Lin, H. Lu, and Y. Zhang, editors, Asia-PacificWeb Conference, volume 3007 of Lecture Notes in Computer Science, pages69–78. Springer, 2004.
C. D. Manning, P. Raghavan, and H. Schutze. Introduction to InformationRetrieval. Cambridge University Press, 2008.
Salton, G. & McGill, M. J. , Introduction to Modern Information Retrieval. McGraw-Hill, New York 1983.
S. Osinski, J. Stefanowski, and D. Weiss. Lingo: Search results clusteringalgorithm based on singular value decomposition. In Intelligent InformationSystems, pages 359–368, 2004.
Joel W. R. et al. , TF-ICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams, ICMLA, pages 258-263. IEEE Computer Society, (2006).
Salton. , G. Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley, 1989.
Salton, G. & Buckley, C. , Term-Weighting Approaches in Automatic Text Retrieval, Information Processing & Management, 24(5), pp. 513-523, 1988.
Salton, G. , Syntactic approaches to automatic book indexing. In Proc of the annual meeting on Association for Computational Linguistics (ACL) (1988), pages 204-210, Department of Computer Science, Cornell University, Ithaca, New York, 1988.
Anh, V. & Moffat, A. , Simplified similarity scoring using term ranks, Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, August 15-19, 2005, Salvador, Brazil.
Baeza-Yates, R. , &Ribeiro-Neto, B. , Modern information retrieval. Harlow, England: Addison - Wesley Longman Ltd, 1999.
Robertson, S. , Walker, S. , M. M. Beaulieu, Gatford, M. & A. Payne, Okapi at trec-4. In NIST Special Publication 500-236: The Fourth Text Retrieval Conference (TREC-4), pages 73 - 96, 1995.
Karbasi, S. &Yaghoubi, M. , International Journal of Computer Applications, Volume 38, January 2012.
O. Zamir and O. Etzioni. Web document clustering: A feasibility demonstration. In Proceedings of the Special Interest Group on Information Retrieval (SIGIR) Conference on Research and Development in Information Retrieval, 1998.
Osinski, S. , Weiss, D. , 2005. A Concept-Driven Algorithm for Clustering Search Results. IEEE Intelligent Systems, 20 (3), 48–54.
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice-Hall, Inc. , Upper Saddle River, NJ, USA, 1988.
C. Carpineto and G. Romano. Odp-239 dataset. http://credo. fub. it/odp239/, 2009. Accessed on August, 19, 2011.
Open directory project. http://www. dmoz. org/. Accessed on August, 19,2011.

Index Terms

Computer Science

Information Sciences

Keywords

Weighted clustering Term importance degree Term frequency normalization