Optimization of Internet Search based on Noun Phrases and Clustering Techniques

R. Subhashini; V. Jawahar Senthil Kumar

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2026

Submit your paper

Know more

The week's pick

Structured and Compact: A Novel Encoding and Enhancement Paradigm for ML-based SAT Solving

Ziqi Zhang Lan Zhang

Random Articles

Identifying Overloaded Servers and Managing Dynamic Placement of Virtual machines in Cloud

April

2016

A Survey on various Machine Learning Approaches for ECG Analysis

Apr

2017

Sentiment Analysis Approach based N-gram and KNN Classifier

Jul

2018

A Novel Technique for Data Extraction from Hidden Web Databases

February

2011

Reseach Article

Optimization of Internet Search based on Noun Phrases and Clustering Techniques

by R. Subhashini, V. Jawahar Senthil Kumar

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 20 - Number 2

Year of Publication: 2011

Authors: R. Subhashini, V. Jawahar Senthil Kumar

10.5120/2402-3195

R. Subhashini, V. Jawahar Senthil Kumar . Optimization of Internet Search based on Noun Phrases and Clustering Techniques. International Journal of Computer Applications. 20, 2 ( April 2011), 49-54. DOI=10.5120/2402-3195

@article{ 10.5120/2402-3195,

author = { R. Subhashini, V. Jawahar Senthil Kumar },

title = { Optimization of Internet Search based on Noun Phrases and Clustering Techniques },

journal = { International Journal of Computer Applications },

issue_date = { April 2011 },

volume = { 20 },

number = { 2 },

month = { April },

year = { 2011 },

issn = { 0975-8887 },

pages = { 49-54 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume20/number2/2402-3195/ },

doi = { 10.5120/2402-3195 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:06:47.348610+05:30

%A R. Subhashini

%A V. Jawahar Senthil Kumar

%T Optimization of Internet Search based on Noun Phrases and Clustering Techniques

%J International Journal of Computer Applications

%@ 0975-8887

%V 20

%N 2

%P 49-54

%D 2011

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Information Retrieval plays a vital role in our daily activities and its most prominent role marked in search engines. Retrieval of the relevant natural language text document is of more challenge. Typically, search engines are low precision in response to a query, retrieving lots of useless web pages, and missing some other important ones. In this paper, we present linguistic phenomena of NLP using shallow parsing and Chunking to extract the Noun Phrases. These noun phrases are used as key phrases to rank the documents (typically a list of titles and snippets returned by a certain Web search engine). Organizing Web search results in to clusters facilitates user’s quick browsing through search results. Traditional clustering techniques are inadequate since they don't generate clusters with highly readable names. Here, we also proposed an approach for web search results clustering based on a phrase based clustering algorithm Known as Optimized Snippet Flat Clustering (OSFC). It is an alternative to a single ordered result of search engines. This approach presents a list of clusters to the user. Experimental results verify our method's feasibility and effectiveness.

References

L. Page and S. Brin, “The anatomy of a search engine”, in. Proc. of the 7th International WWW Conference (WWW 98), Brisbane, Australia, April 14–18, 1998.
Jansen, B. J, “The effect of query complexity on Web searching results”, Information Research, Volume 6 No. 1, October, 2000.
M. Liu, X. & Croft, W.B, “Statistical Language Modeling for Information Retrieval”, In Cronin, B. (Ed.). Annual Review of Information Science & Technology. Vol 38, 2004.
D. R. Cutting, D. R. Karger, J. O. Pedersen and J. W. Tukey, “Scatter/Gather: a cluster-based approach to browsing large document collections”, In Proceedings of the 15th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 318-29, 1992.
Zamir O., Etzioni O, “Web Document Clustering: A Feasibility Demonstration”, Proceedings of the 19th International ACM SIGIR Conference on Research and Development of Information Retrieval (SIGIR'98), 46-54, 1998.
Baeza-Yates, R., Ribeiro-Neto, B. Modern Information Retrieval. ACM Press. New York. pp. 25-30, 1999.
Majumder P., Mitra M., Chaudhari B, “N-gram: A Language Independent Approach to IR and Natural Language Processing”, Lecture Notes, 2002.
Narita, M.& Ogawa, Y, “The use of phrases from query texts in information retrieval”, SIGIR Forum, 34, 318-20 RIAO, College de France, pp. 665-681, 2000.
Khaled M. Hammouda, Mohamed s. Kame, “Efficient Phrase-Based document Indexing for web document clustering”, IEEE Transactions on Knowledge and Data Engineering, vol. 16, No. 10, Oct, 2004.
Hua-Jun Zeng and et.at., “Learning to Cluster Web Search Results ”, SIGIR’04 , Peking University, 2004.
Hung, C. and D. Xiaotie, “A new suffix tree similarity measure for document clustering”, In Proceedings of the 16th international conference on World Wide Web.ACM: Banff, Alberta, Canada, 2007.
J.W.Yang, “A Chinese Web Page Clustering Algorithm Based on the Suffix Tree”, Wuhan University Journal of National Sciences [M]. 9 (5):817-822, 2004
Yahoo! Search BOSS (Build your Own Search Service) http://developer.yahoo.com/search/boss/
A Vector Space Model For Automatic Indexing, G. Salton, A. Wong and C. S. Yang, Cornell University.
SharpNLP - open source natural language processing tools, http://www.codeplex.com/sharpnlp
M. F. Porter, “An algorithm for suffix stripping”, Program, 14(3), pp.130-137, 1980.
Salton, Gerald, and Christopher Buckley,“ Term-weighting approaches in automatic text retrieval”, IP&M 24(5):513–523. 133, 520, 530, 1988.

Index Terms

Computer Science

Information Sciences

Keywords

Noun Phrases Document Clustering Information Retrieval Natural Language Processing Web Mining