CFP last date
20 May 2024
Reseach Article

A Query based Approach to Reduce the Web Crawler Traffic using HTTP Get Request and Dynamic Web Page

by Shekhar Mishra, Anurag Jain, Dr. A.K. Sachan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 14 - Number 3
Year of Publication: 2011
Authors: Shekhar Mishra, Anurag Jain, Dr. A.K. Sachan
10.5120/1826-2406

Shekhar Mishra, Anurag Jain, Dr. A.K. Sachan . A Query based Approach to Reduce the Web Crawler Traffic using HTTP Get Request and Dynamic Web Page. International Journal of Computer Applications. 14, 3 ( January 2011), 8-14. DOI=10.5120/1826-2406

@article{ 10.5120/1826-2406,
author = { Shekhar Mishra, Anurag Jain, Dr. A.K. Sachan },
title = { A Query based Approach to Reduce the Web Crawler Traffic using HTTP Get Request and Dynamic Web Page },
journal = { International Journal of Computer Applications },
issue_date = { January 2011 },
volume = { 14 },
number = { 3 },
month = { January },
year = { 2011 },
issn = { 0975-8887 },
pages = { 8-14 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume14/number3/1826-2406/ },
doi = { 10.5120/1826-2406 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:02:25.147480+05:30
%A Shekhar Mishra
%A Anurag Jain
%A Dr. A.K. Sachan
%T A Query based Approach to Reduce the Web Crawler Traffic using HTTP Get Request and Dynamic Web Page
%J International Journal of Computer Applications
%@ 0975-8887
%V 14
%N 3
%P 8-14
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The functions of Web crawler download information from web for search engine. Web pages changed without any notice. Web crawler has to revisit web site to download updated and new web pages. It is estimated 40% of current web traffic is generated by web crawler. This paper proposes query based approach to inform updates on web site to web crawler using Dynamic web page and HTTP GET Request. Dynamic web page generates HTML based response having list of updates on web site after crawler last visit. Web crawler only visits updated web pages instead of visiting full web sites for updates. Proposed scheme is tested & results show that it is very promising.

References
  1. “Web crawler”, From Wikipedia, http://en.wikipedia.org/wiki/Web_crawler
  2. “World Wide Web”, From Wikipedia, http://en.wikipedia.org/wiki/World_Wide_Web
  3. “Robots Exclusion Protocol”, http://www.robotstxt.org/robotstxt.html
  4. “Robots exclusion standard”, Wikipedia http://en.wikipedia.org/wiki/Robots_exclusion_standard
  5. “Sitemaps”, Wikipedia, http://en.wikipedia.org/wiki/Sitemaps
  6. Bal.S and Nath.R,”Filtering the web pages that are not modified at remote site without downloading using mobile crawler”. Information Technology journal 9(2)2010 ISSN 1812- 5638, Asian Network for Sciencetific information. (pp: 376-380)
  7. Cambazoglu, B.B.; Junqueira, F.; Plachouras, V.; Telloli, L., “On the feasibility of geographically distributed web crawling.” (ISBN: 978-963-9799-28-8) In the proceedings of Third International ICST Conference on Scalable Information Systems, ICST, Vico Equense, Italy (2008)
  8. Chandramouli A and Gauch. S. “A Co-operative Web Services Paradigm for Supporting Crawlers”, In the proceedings of Computer-Assisted Information Retrieval (Recherche d'Information et ses Applications) - RIAO 2007, 8th International Conference, Carnegie Mellon University, Pittsburgh, PA, USA, May 30 - June 1, 2007.
  9. Mishra.S, Jain.A and Sachan A.K,”Smart Approach to Reduce the Web Crawling Traffic of Existing System using HTML based Update File at Web Server”, International Journal of Computer Applications 11(7), December 2010(pp: 34–38)
  10. McCurley S. Kevin “Incremental Crawling” Google Research http://static.googleusercontent.com/external_content/untrusted_dlcp/www.google.com/en//research/pubs/archive/34403.pdf
  11. Pahal N, Kumar S, Bhardwaj A and Chauhan N,” Security Mobile Agent Based Crawler = (SMABC)”. International Journal of Computer Applications 1(14), February 2010. (pp: 5–11)
  12. Sharma A.K, Dixit. A and Singhal N. “Design of a Priority Based Frequency Regulated Incremental Crawler” 2010 International Journal of Computer Applications (ISSN: 0975 – 8887) Volume 1 – No. 1. (pp: 42-47)
  13. Sun. Y, Councill G. Isaac and Giles C. Lee, “The Ethicality of Web Crawlers”, in the proceedings of 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Toronto Canada august 2010. (pp: 668-675)
  14. Thelwall. M and Stuart. D, “Web crawling ethics revisited: Cost, privacy and denial of service". Journal of the American Society for Information Science and Technology. 2006. Volume 57, Issue 13 November 2006. (pp: 1771 - 1779)
  15. Yuan, X.M. and J. Harms, “An efficient scheme to remove crawler traffic from the internet.” Proceedings of the 11th International Conference on Computer Communications and Networks, Oct 2002. 14-16, IEEE CS Press, (pp: 90-95).
Index Terms

Computer Science
Information Sciences

Keywords

Web Search Engine Web Web Crawler Web Crawling Traffic HTTP GET request query