CFP last date
22 April 2024
Call for Paper
May Edition
IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 22 April 2024

Submit your paper
Know more
Reseach Article

An Efficiently harvesting Deep Web Interfaces based on Two Stage Crawler

Published on June 2018 by Rohini Navnathkhedkar, Madhuri Dalal
International Conference on Emerging Trends in Computing and Communication
Foundation of Computer Science USA
ICETCC2017 - Number 3
June 2018
Authors: Rohini Navnathkhedkar, Madhuri Dalal
48b0d5f2-8db8-4ccc-bd43-79d3c495e666

Rohini Navnathkhedkar, Madhuri Dalal . An Efficiently harvesting Deep Web Interfaces based on Two Stage Crawler. International Conference on Emerging Trends in Computing and Communication. ICETCC2017, 3 (June 2018), 18-22.

@article{
author = { Rohini Navnathkhedkar, Madhuri Dalal },
title = { An Efficiently harvesting Deep Web Interfaces based on Two Stage Crawler },
journal = { International Conference on Emerging Trends in Computing and Communication },
issue_date = { June 2018 },
volume = { ICETCC2017 },
number = { 3 },
month = { June },
year = { 2018 },
issn = 0975-8887,
pages = { 18-22 },
numpages = 5,
url = { /proceedings/icetcc2017/number3/29474-c129/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference on Emerging Trends in Computing and Communication
%A Rohini Navnathkhedkar
%A Madhuri Dalal
%T An Efficiently harvesting Deep Web Interfaces based on Two Stage Crawler
%J International Conference on Emerging Trends in Computing and Communication
%@ 0975-8887
%V ICETCC2017
%N 3
%P 18-22
%D 2018
%I International Journal of Computer Applications
Abstract

As deep web grows at a very fast pace, there has been increased interest in techniques that help efficiently locate deep-web interfaces. However, due to the large volume of web resources and the dynamic nature of deep web, achieving wide coverage and high efficiency is a challenging issue. We propose a two-stage framework, for harvesting deep web interfaces. In the first stage of harvesting, performs site-based searching for center pages with the help of search engines, avoiding visiting a large number of pages. To achieve more accurate results for a focused crawl ranks websites to prioritize highly relevant ones for a given topic. In the second stage, it achieves fast in-site searching by excavating most relevant links with an adaptive link-ranking.

References
  1. Feng Zhao, Jingyu Zhou, Chang Nie, Heqing Huang, Hai Jin "SmartCrawler: A Two Stage Crawler for efficiently harvesting Deep-Web interfaces" IEEE Transactions on Services Computing Volume: 99 PP Year: 2015.
  2. L. Barbosa and J. Freire, "An adaptive crawler for locating hidden web entry points," in Proc. 16th Int. Conf. World Wide Web, 2007, pp. 441–450.
  3. . Olston and M. Najork , "Web Crawling", Foundations and Trends in Information Retrieval, vol. 4, No. 3 ,pp. 175–246, 20.
  4. Y. He, D. Xin, V. Ganti, S. Rajaraman, and N. Shah, "Crawling deep web entity pages," in Proc. 6th ACM Int. Conf. Web Search Data Mining, 2013, pp. 355–364.
  5. Barbosa and J. Freire, "Searching for hidden-web databases,"in Proc. 8th Int. Workshop Web Databases, 2005, pp. 1–6.
  6. Rabia and Sami, Lalitha K. , "Understanding the Deep Web" (2010). Library Philosophy and Practice (e-journal). Paper 364. http://digitalcommons. unl. edu/libphilprac.
Index Terms

Computer Science
Information Sciences

Keywords

Deep Web Ranking Adaptive Learning Two-stage Crawler.