CFP last date
22 April 2024
Reseach Article

Focused Crawler based on Efficient Page Rank Algorithm

by Anand Ratna, Divya, Akshay Sawhney
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 116 - Number 7
Year of Publication: 2015
Authors: Anand Ratna, Divya, Akshay Sawhney
10.5120/20351-2540

Anand Ratna, Divya, Akshay Sawhney . Focused Crawler based on Efficient Page Rank Algorithm. International Journal of Computer Applications. 116, 7 ( April 2015), 37-40. DOI=10.5120/20351-2540

@article{ 10.5120/20351-2540,
author = { Anand Ratna, Divya, Akshay Sawhney },
title = { Focused Crawler based on Efficient Page Rank Algorithm },
journal = { International Journal of Computer Applications },
issue_date = { April 2015 },
volume = { 116 },
number = { 7 },
month = { April },
year = { 2015 },
issn = { 0975-8887 },
pages = { 37-40 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume116/number7/20351-2540/ },
doi = { 10.5120/20351-2540 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:56:29.634683+05:30
%A Anand Ratna
%A Divya
%A Akshay Sawhney
%T Focused Crawler based on Efficient Page Rank Algorithm
%J International Journal of Computer Applications
%@ 0975-8887
%V 116
%N 7
%P 37-40
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The size of the WWW is increasing rapidly and its nature is dynamic, building an efficient search mechanism is very necessary. A vast number of pages continually being added every day, so fetching information about a special-topic is gaining importance, which poses exceptional scaling challenges for general-purpose crawlers and search engines. This paper describes a web crawling approach based on best first search. Instead of collecting and indexing all available web documents to be able to answer all possible queries, a focused crawler choose the links that are likely to be most relevant for the crawl, and avoids irrelevant links of the document. This leads to significant savings in hardware as well as network resources and also helps keep the crawl more up-to-date. To accomplish such goal-directed crawling, select top most K relevant documents for a given query and then expand the most promising link chosen according to link score, to circumvent irrelevant regions of the web.

References
  1. Bing Liu, "Web Content Mining" the 14th international world wide web conference
  2. De Bra, P. , Houben, G. , Kornatzky, Y. , Post, R. ``Information retrieval in distributed hypertexts''. Proc. 4th RIAO Conference, 1994.
  3. S. Chakrabarti, M. van der Berg, and B. Dom, "Focused crawling: a new approach to topic-specific web resource discovery," in Proc. of the 8th International World-Wide Web Conference (WWW8), 1999.
  4. J. Cho, H. Garcia-Molina, and L. Page, "Efficient crawling through URL ordering," in Proceedings of the Seventh World-Wide Web Conference, 1998
  5. SunitaRawat, D. R. Patil Department of Computer Science and Engineering, 2013 3rd IEEE International Advance Computing Conference (IACC).
  6. A. McCallum, K. Nigam, J. Rennie, and K. Seymore, "Building domainspecic search engines with machine learning techniques," in Proc. AAAI Spring Symposium on Intelligent Agents in Cyberspace, 1999.
  7. A. K. McCallum, K. Nigam, J. Rennie, and K. Seymore, "Automating the construction of internet por- tals with machine learning," To appear in Information Retrieval.
  8. M. Gori, M. Maggini, and F. Scarselli, "http://nautilus. dii. unisi. it. "
  9. Menczer F. , Pant G. and Srivasan, P. "Topical Web Crawler: Evaluating Adaptive Algorithms" ACM Transaction on internet Technology (TOIT). Nov. 2014.
  10. S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. Kleinberg, "Automatic resource compilation by analyzing hyperlink structure and associated text," in Proc. 7th World Wide Web Conference, Brisbane, Australia, 1998
  11. K. Bharat and M. Henzinger, "Improved algorithms for topic distillation in hyperlinked environments," in Proceedings 21st Int'l ACM SIGIR Conference. , 1998.
  12. McCown, F. and Nelson, M. "Agreeing to Disagree: Search Engines and their Public Interfaces". ACM IEEE Joint Conference on Digital Libraries (JCDL 2007). Vancouver, British Columbia, Canada. pp. 309318. June 17-23, 2007.
  13. Bao, S. , Li, R. , Yu, Y. and Cao, Y. "Competitor Mining with the Web Knowledge". IEEE Transactions on Data Engineering, Volume: 20, Issue: 10, pp. 1297-1310, Oct. 2008.
  14. J. Kleinberg, "Authoritative sources in a hyperlinked environment. " Report RJ 10076, IBM, May 1997.
  15. Zhang, T. Zhou, Z. Yu and D. Chen, "URL rule based focusedcrawlers", IEEE International Conference on e-Business Engineering, 2008.
  16. TfIdf weighting from http://nlp. stanford. edu/IRbook/html/htmledition/tf-idf-weighting-1. html
  17. Page Rank form Wikipedia, the free encyclopedia http://en. wikipedia. org/wiki/PageRank/
Index Terms

Computer Science
Information Sciences

Keywords

Focused web crawler TF-IDF Relevancy calculation Page Rank.