CFP last date
20 May 2024
Reseach Article

A Novel Technique for Spare Web Page Detection in Parallel Web Crawler

by Gaurav Kumar Srivastav, Irphan Ali, Atul Kumar Srivastava
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 94 - Number 12
Year of Publication: 2014
Authors: Gaurav Kumar Srivastav, Irphan Ali, Atul Kumar Srivastava
10.5120/16392-6009

Gaurav Kumar Srivastav, Irphan Ali, Atul Kumar Srivastava . A Novel Technique for Spare Web Page Detection in Parallel Web Crawler. International Journal of Computer Applications. 94, 12 ( May 2014), 1-5. DOI=10.5120/16392-6009

@article{ 10.5120/16392-6009,
author = { Gaurav Kumar Srivastav, Irphan Ali, Atul Kumar Srivastava },
title = { A Novel Technique for Spare Web Page Detection in Parallel Web Crawler },
journal = { International Journal of Computer Applications },
issue_date = { May 2014 },
volume = { 94 },
number = { 12 },
month = { May },
year = { 2014 },
issn = { 0975-8887 },
pages = { 1-5 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume94/number12/16392-6009/ },
doi = { 10.5120/16392-6009 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:18:14.721788+05:30
%A Gaurav Kumar Srivastav
%A Irphan Ali
%A Atul Kumar Srivastava
%T A Novel Technique for Spare Web Page Detection in Parallel Web Crawler
%J International Journal of Computer Applications
%@ 0975-8887
%V 94
%N 12
%P 1-5
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The World Wide Web is increasing in the random rate of web pages and all web pages are rapidly updated about the need of user. Web search engine downloads web pages and the user cannot take the relevant update information for World Wide Web within short period of time. In this paper, we represent novel technique which helps in downloading the updated relevant web pages from World Wide Web. We will be implementing a new algorithm which can find out the update web page on World Wide Web. This algorithm compares the Content Weight of old web page content and downloaded update web page content. In this paper, we have also avoid the downloading of spare web pages from World Wide Web . This is a novel techniques improved the downloading rate of web pages and it is decreased the network bandwidth of web crawler by the help of parallel web crawler. This web detection technique will be downloaded the update web pages from World Wide Web and minimize the web browsing period of time.

References
  1. Sergey Brin and Lawrence Page, "The Anatomy of a Large-Scale Hypertextual Web Search Engine", In Proceedings of the Seventh World-Wide Web Conference, 1998.
  2. S. Chawathe, H. Garcia-Molina, "Meaningful Change detection in structured data", In proceeding in ACM SIGMOD International conference,pp 26-37,May1997.
  3. L. Francisco-Revilla, F. Shipman, R. Furuta, Unmil Karadkar, and Avital Arora," Managing Change on the Web", ACM /1-58113-345- 6/01/0006 pp 67-76, June 2001.
  4. Y. Wang, D. DeWitt, and J. Cai, "X-Diff: An Effective Change Detection Algorithm for XML Documents", Proc. 19th Int'l Conf. Data Eng. , pp. 519-30, 2003.
  5. S. Chakravarthy and S. C. Hari Hara, "Automating Change detection and Notification of Web Pages", In Proceedings of the 17th International Conference on Database and Expert Systems Applications (DEXA'06), IEEE , 0-7695-2641-1/06, 2006.
  6. Ying Pan, Xuhua Ding "Anomaly Based Web Phishing Page Detection",In Proceedings of the 22nd Annual Computer Security Applications Conference,IEEE, 0-7695-2716-7/06,2006.
  7. Imad Khoury, Rami M. El-Mawas, Oussama El-Rawas, Elias F. Mounayar, and Hassan Artail, "An Efficient Web Page Change Detection System Based on an Optimized Hungarian Algorithm ", In IEEE Transactions on Knowledge and Data Engineering, Vol. 19, NO. 5, pp 599-613, May 2007.
  8. D. Yadav , A. K. Sharma and J. P. Gupta,"Change Detection in Web Pages ", In 10th International Conference on Information Technology ,IEEE , 0-7695-3068-0/07, pp 265-270, 2007.
  9. H. Artail and M. Abi-Aad,"An enhanced web page change detection approach based on limiting similarity computations to elements of same type", Springer Science + Business Media, LLC, pp 1-21, 2007.
  10. D. Yadav , A. K. Sharma and J. P. Gupta, "Parallel Crawler Architecture and Web Page Change Detection ", In WSEAS Transactions on Computers, ISSN: 1109-2750 , Issue 7, Vol. 7, pp 929-940 , July 2008.
  11. H. Artail , K. Fawaz. ,"A fast HTML web page change detection approach based on hashing and reducing the number of similarity computations", Elsevier, Data & Knowledge Engineering 66 , pp 326– 337, 2008.
  12. H. P. Khandagale and P. P. Halkarnikar,"A Novel Approach for Web Page Change Detection System", In International Journal of Computer Theory and Engineering, Vol. 2, No. 3, 1793-8201,pp 364-368, June, 2010 .
  13. S. Mali and B. B. Meshram , "Focused Web Crawler with Page Change Detection Policy ", In International Journal of Computer Applications (IJCA), pp 51-57, 2011.
  14. S. Goel and R. R. Aggarwal, "An Efficient Algorithm for Web Page Change Detection" , In International Journal of Computer Applications (0975 – 888), Vol. 48– No. 10, pp 28-33, June 2012.
Index Terms

Computer Science
Information Sciences

Keywords

World Wide Web Search Engine Web Crawler Parallel Web Crawler ASCII value and Position of ith .