CFP last date
20 May 2024
Reseach Article

Design of a Priority Based Frequency Regulated Incremental Crawler

by Niraj Singhal, Ashutosh Dixit, Dr. A. K. Sharma
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 1 - Number 1
Year of Publication: 2010
Authors: Niraj Singhal, Ashutosh Dixit, Dr. A. K. Sharma
10.5120/23-131

Niraj Singhal, Ashutosh Dixit, Dr. A. K. Sharma . Design of a Priority Based Frequency Regulated Incremental Crawler. International Journal of Computer Applications. 1, 1 ( February 2010), 42-47. DOI=10.5120/23-131

@article{ 10.5120/23-131,
author = { Niraj Singhal, Ashutosh Dixit, Dr. A. K. Sharma },
title = { Design of a Priority Based Frequency Regulated Incremental Crawler },
journal = { International Journal of Computer Applications },
issue_date = { February 2010 },
volume = { 1 },
number = { 1 },
month = { February },
year = { 2010 },
issn = { 0975-8887 },
pages = { 42-47 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume1/number1/23-131/ },
doi = { 10.5120/23-131 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T19:43:39.070802+05:30
%A Niraj Singhal
%A Ashutosh Dixit
%A Dr. A. K. Sharma
%T Design of a Priority Based Frequency Regulated Incremental Crawler
%J International Journal of Computer Applications
%@ 0975-8887
%V 1
%N 1
%P 42-47
%D 2010
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The World Wide Web is a huge source of hyperlinked information contained in hypertext documents. Search engines use web crawlers to collect these documents from web for the purpose of storage and indexing. However, many of these documents contain dynamic information which gets changed on daily, weekly, monthly or yearly basis and hence we need to refresh the search engine side storage so that latest information is made available to the user. An incremental crawler visits the web repeatedly after a specific interval for updating its collection. In this paper to regulate the revisiting frequency a novel mechanism and a novel architecture for incremental crawler is being proposed.

References
  1. A. K. Sharma, J. P. Gupta, D. P. Agarwal, “Augment Hypertext Documents suitable for parallel crawlers”, accepted for presentation and inclusion in the proceedings of WITSA-2003, a National workshop on Information Technology Services and Applications, Feb’2003, New Delhi.
  2. A. K. Sharma, J. P. Gupta, D. P. Agarwal, “ A novel approach towards management of Volatile Information” Journal of CSI, Vol. 33 No. 1, pp 18-27, Sept’ 2003.
  3. Alexandros Ntoulas, Junghoo Cho, Christopher Olston, “What’s new on the Web ? The Evolution of the Web from a Search Engine perspective.”, In Proceedings of the World-Wide Web Conference (WWW), May 2004.
  4. Arvind Arasu, Junghoo Cho, Hector Garcia-Molina, Andreas Paepcke, Sriram Raghavan "Searching the Web." ACM Transactions on Internet Technology, 1(1): August 2001
  5. Ashutosh Dixit, Harish Kumar and A.K Sharma, “Self Adjusting Refresh Time Based Architecture For Incremental Web Crawler”, International Journal of Computer Science and Network Security (IJCSNS), Vol 8, No12, Dec 2008.
  6. Barry M. Leiner, Vinton G. Cerf, David D. Clark, Robert E. Kahn, Leonard, Kleinrock, Daniel C. Lynch, Jon Postel, Larry G. Roberts, Stephen Wolff, “A Brief History of the Internet”, www.isoc.org/internet/history.
  7. Brian E. Brewington and George Cybenko. “How dynamic is the web.”, In Proceedings of the Ninth International World-Wide Web Conference, Amsterdam, Netherlands, May 2000.
  8. C. Dyreson, H.-L. Lin, Y. Wang, “Managing Versions of Web Documents in a Transaction-time Web Server” In Proceedings of the World-Wide Web Conference.
  9. Dirk Lewandowski, “Web searching, search engines and Information Retrieval, Information Services & Use”, 25 (2005) 137-147, IOS Press, 2005
  10. Heydon A., Najork M., “Mercator: A scalable, extensible Web crawler.”, World Wide Web, vol. 2, no. 4, pp. 219-229, 1999.
  11. J. Dean and M. Henzinger, “Finding related pages in the world wide web”, Proceedings of the 8th International World Wide Web Conference (WWW8), pages 1467-1479, 1999.
  12. Junghoo Cho and Hector Garcia-Molina. 2000a. “The evolution of the web and implications for an incremental crawler”., In Proceedings of the 26th International Conference on Very Large Databases.
  13. Junghoo Cho and Hector Garcia-Molina, “Estimating frequency of change”, 2000, Submitted to VLDB 2000, Research track.
  14. Komal Kumar Bhatia, A. K. Sharma, “A Framework for Domain-Specific Interface Mapper (DSIM)”, International Journal of Computer Science and Network Security (IJCSNS), Vol 8, No12, Dec 2008.
  15. Mark Najork, Allan Heydon, “High- Performance Web Crawling”, September 2001
  16. Mike, Burner, “Crawling towards Eternity : Building an archive of the World Wide Web”, Web Techniques Magazine, 2(5), May 1997
  17. Sergey Brin and Lawrence Page. “The anatomy of a large-scale hyper textual Web search engine”. Proceedings of the Seventh International World Wide Web Conference, pages 107—117, April 1998.
  18. S. Chakrabarti, M. van den Berg, and B. Dom, “Distributed hypertext resource discovery through examples”, Proceedings of the 25th International Conference on Very Large Databases (VLDB), pages 375-386, 1999.
Index Terms

Computer Science
Information Sciences

Keywords

web search engine incremental crawler dynamic information crawl workers