CFP last date
22 April 2024
Call for Paper
May Edition
IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 22 April 2024

Submit your paper
Know more
Reseach Article

Focused Web Crawler with Page Change Detection Policy

Published on None 2011 by Swati Mali, B.B. Meshram
International Conference and Workshop on Emerging Trends in Technology
Foundation of Computer Science USA
ICWET - Number 9
None 2011
Authors: Swati Mali, B.B. Meshram
bac0883c-c44f-44a2-986d-db6ec7ed21be

Swati Mali, B.B. Meshram . Focused Web Crawler with Page Change Detection Policy. International Conference and Workshop on Emerging Trends in Technology. ICWET, 9 (None 2011), 51-56.

@article{
author = { Swati Mali, B.B. Meshram },
title = { Focused Web Crawler with Page Change Detection Policy },
journal = { International Conference and Workshop on Emerging Trends in Technology },
issue_date = { None 2011 },
volume = { ICWET },
number = { 9 },
month = { None },
year = { 2011 },
issn = 0975-8887,
pages = { 51-56 },
numpages = 6,
url = { /proceedings/icwet/number9/2128-db34/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference and Workshop on Emerging Trends in Technology
%A Swati Mali
%A B.B. Meshram
%T Focused Web Crawler with Page Change Detection Policy
%J International Conference and Workshop on Emerging Trends in Technology
%@ 0975-8887
%V ICWET
%N 9
%P 51-56
%D 2011
%I International Journal of Computer Applications
Abstract

Focused crawlers aim to search only the subset of the web related to a specific topic, and offer a potential solution to the problem. The major problem is how to retrieve the maximal set of relevant and quality pages. In this paper, We propose an architecture that concentrates more over page selection policy and page revisit policy The three-step algorithm for page refreshment serves the purpose. The first layer contributes to decision of page relevance using two methods. The second layer checks for whether the structure of a web page has been changed or not, the text content has been altered or whether an image is changed. Also a minor variation to the method of prioritizing URLs on the basis of forward link count has been discussed to accommodate the purpose of frequency of update. And finally, the third layer helps to update the URL repository.

References
  1. Bidoki, Yazdani et el, “FICA: A fast intelligent crawling algorithm”, Web Intelligence, IEEE/ACM/WIC International conference on Intelligent agent technology, Pages 635-641, 2007.
  2. Cui Xiaoqing Yan Chun,” An evolutionary relevance calculation measure in topic crawler ” CCCM 2009, ISECS International Colloquium on Computing, Communication, Control, and Management, 267 – 270, aug 2009
  3. Junghoo Cho, Hector Garcia-Molina, Lawrence Page, |Efficient crawling through URL ordering”, 7th International WWW Conference , April 14-18, Brisbane, 1998.
  4. Mukhopadhyay et al, “A New Approach to Design Domain Specific Ontology Based Web Crawler”, ICIT 2007, 10th International Conference on Information Technology, 289 - 291, Dec. 2007
  5. Peisu, Ke et el, “A Framework of deep web crawler”, 27th Chinese Proceedings of the 27th Chinese Control Conference, Pages 582-586, July 16-18, 2008.
  6. Yadav, Sharma et el, “Architecture for parallel crawling and algorithm for change detection in web pages”, 10th International Conference on Information Technology, Pages 258-264, ICIT 2007.
  7. Yuan, Yin et el, “Improvement of pagerank for focused crawler”, 8th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, Pages 797-802, SNPD 2007.
  8. Zheng, Chen, “HAWK: a Focused crawler with content and link analysis”, E-business engineering, 2008, ICEBE’08, IEEE international conference, pages 677-680, Oct 2008.
  9. Zheng, Zhaou ET el, “URL Rule based focused crawler”, E-business engineering, ICEBE’08, IEEE international conference, Oct 2008, pages 147-154, 2008.
Index Terms

Computer Science
Information Sciences

Keywords

Focused crawler page change detection crawler policies crawler database