Focused Crawler based on Efficient Page Rank Algorithm

Anand Ratna; Divya; Akshay Sawhney

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 20 June 2025

Submit your paper

Know more

The week's pick

Designing Multi-Tenant E-Learning Systems in the Cloud: A Process-Oriented Approach for Higher Education

Sameh Azouzi Sonia Ayachi Ghannouchi

Random Articles

Data Mining using Modified GFMM Neural Network

April

2015

Monitoring System using GSM

May

2015

ON Tiling Patterns Involving Islamic Stars with an Odd Number of Vertices

March

2013

Design and Implementation of Scalable, Fully Distributed Web Crawler for a Web Search Engine

February

2011

Reseach Article

Focused Crawler based on Efficient Page Rank Algorithm

by Anand Ratna, Divya, Akshay Sawhney

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 116 - Number 7

Year of Publication: 2015

Authors: Anand Ratna, Divya, Akshay Sawhney

10.5120/20351-2540

Anand Ratna, Divya, Akshay Sawhney . Focused Crawler based on Efficient Page Rank Algorithm. International Journal of Computer Applications. 116, 7 ( April 2015), 37-40. DOI=10.5120/20351-2540

@article{ 10.5120/20351-2540,

author = { Anand Ratna, Divya, Akshay Sawhney },

title = { Focused Crawler based on Efficient Page Rank Algorithm },

journal = { International Journal of Computer Applications },

issue_date = { April 2015 },

volume = { 116 },

number = { 7 },

month = { April },

year = { 2015 },

issn = { 0975-8887 },

pages = { 37-40 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume116/number7/20351-2540/ },

doi = { 10.5120/20351-2540 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:56:29.634683+05:30

%A Anand Ratna

%A Divya

%A Akshay Sawhney

%T Focused Crawler based on Efficient Page Rank Algorithm

%J International Journal of Computer Applications

%@ 0975-8887

%V 116

%N 7

%P 37-40

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The size of the WWW is increasing rapidly and its nature is dynamic, building an efficient search mechanism is very necessary. A vast number of pages continually being added every day, so fetching information about a special-topic is gaining importance, which poses exceptional scaling challenges for general-purpose crawlers and search engines. This paper describes a web crawling approach based on best first search. Instead of collecting and indexing all available web documents to be able to answer all possible queries, a focused crawler choose the links that are likely to be most relevant for the crawl, and avoids irrelevant links of the document. This leads to significant savings in hardware as well as network resources and also helps keep the crawl more up-to-date. To accomplish such goal-directed crawling, select top most K relevant documents for a given query and then expand the most promising link chosen according to link score, to circumvent irrelevant regions of the web.

References

Bing Liu, "Web Content Mining" the 14th international world wide web conference
De Bra, P. , Houben, G. , Kornatzky, Y. , Post, R. ``Information retrieval in distributed hypertexts''. Proc. 4th RIAO Conference, 1994.
S. Chakrabarti, M. van der Berg, and B. Dom, "Focused crawling: a new approach to topic-specific web resource discovery," in Proc. of the 8th International World-Wide Web Conference (WWW8), 1999.
J. Cho, H. Garcia-Molina, and L. Page, "Efficient crawling through URL ordering," in Proceedings of the Seventh World-Wide Web Conference, 1998
SunitaRawat, D. R. Patil Department of Computer Science and Engineering, 2013 3rd IEEE International Advance Computing Conference (IACC).
A. McCallum, K. Nigam, J. Rennie, and K. Seymore, "Building domainspecic search engines with machine learning techniques," in Proc. AAAI Spring Symposium on Intelligent Agents in Cyberspace, 1999.
A. K. McCallum, K. Nigam, J. Rennie, and K. Seymore, "Automating the construction of internet por- tals with machine learning," To appear in Information Retrieval.
M. Gori, M. Maggini, and F. Scarselli, "http://nautilus. dii. unisi. it. "
Menczer F. , Pant G. and Srivasan, P. "Topical Web Crawler: Evaluating Adaptive Algorithms" ACM Transaction on internet Technology (TOIT). Nov. 2014.
S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. Kleinberg, "Automatic resource compilation by analyzing hyperlink structure and associated text," in Proc. 7th World Wide Web Conference, Brisbane, Australia, 1998
K. Bharat and M. Henzinger, "Improved algorithms for topic distillation in hyperlinked environments," in Proceedings 21st Int'l ACM SIGIR Conference. , 1998.
McCown, F. and Nelson, M. "Agreeing to Disagree: Search Engines and their Public Interfaces". ACM IEEE Joint Conference on Digital Libraries (JCDL 2007). Vancouver, British Columbia, Canada. pp. 309318. June 17-23, 2007.
Bao, S. , Li, R. , Yu, Y. and Cao, Y. "Competitor Mining with the Web Knowledge". IEEE Transactions on Data Engineering, Volume: 20, Issue: 10, pp. 1297-1310, Oct. 2008.
J. Kleinberg, "Authoritative sources in a hyperlinked environment. " Report RJ 10076, IBM, May 1997.
Zhang, T. Zhou, Z. Yu and D. Chen, "URL rule based focusedcrawlers", IEEE International Conference on e-Business Engineering, 2008.
TfIdf weighting from http://nlp. stanford. edu/IRbook/html/htmledition/tf-idf-weighting-1. html
Page Rank form Wikipedia, the free encyclopedia http://en. wikipedia. org/wiki/PageRank/

Index Terms

Computer Science

Information Sciences

Keywords

Focused web crawler TF-IDF Relevancy calculation Page Rank.