An Efficiently harvesting Deep Web Interfaces based on Two Stage Crawler

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

Optimal Assistive Drive System using Mobile Cloud Computing

Mar

2019

Low Leakage Multi Threshold Level Shifter Design using Sleepy Keeper

June

2013

Service based Model using Context Awareness for Ubiquitous Computing

July

2014

Optimum Performance Bounds of Routing Protocols for VANET through Realistic Fading Channel

July

2015

Reseach Article

An Efficiently harvesting Deep Web Interfaces based on Two Stage Crawler

Published on June 2018 by Rohini Navnathkhedkar, Madhuri Dalal

International Conference on Emerging Trends in Computing and Communication

Foundation of Computer Science USA

ICETCC2017 - Number 3

June 2018

Authors: Rohini Navnathkhedkar, Madhuri Dalal

Rohini Navnathkhedkar, Madhuri Dalal . An Efficiently harvesting Deep Web Interfaces based on Two Stage Crawler. International Conference on Emerging Trends in Computing and Communication. ICETCC2017, 3 (June 2018), 18-22.

@article{

author = { Rohini Navnathkhedkar, Madhuri Dalal },

title = { An Efficiently harvesting Deep Web Interfaces based on Two Stage Crawler },

journal = { International Conference on Emerging Trends in Computing and Communication },

issue_date = { June 2018 },

volume = { ICETCC2017 },

number = { 3 },

month = { June },

year = { 2018 },

issn = 0975-8887,

pages = { 18-22 },

numpages = 5,

url = { /proceedings/icetcc2017/number3/29474-c129/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 International Conference on Emerging Trends in Computing and Communication

%A Rohini Navnathkhedkar

%A Madhuri Dalal

%T An Efficiently harvesting Deep Web Interfaces based on Two Stage Crawler

%J International Conference on Emerging Trends in Computing and Communication

%@ 0975-8887

%V ICETCC2017

%N 3

%P 18-22

%D 2018

%I International Journal of Computer Applications

Abstract

As deep web grows at a very fast pace, there has been increased interest in techniques that help efficiently locate deep-web interfaces. However, due to the large volume of web resources and the dynamic nature of deep web, achieving wide coverage and high efficiency is a challenging issue. We propose a two-stage framework, for harvesting deep web interfaces. In the first stage of harvesting, performs site-based searching for center pages with the help of search engines, avoiding visiting a large number of pages. To achieve more accurate results for a focused crawl ranks websites to prioritize highly relevant ones for a given topic. In the second stage, it achieves fast in-site searching by excavating most relevant links with an adaptive link-ranking.

References

Feng Zhao, Jingyu Zhou, Chang Nie, Heqing Huang, Hai Jin "SmartCrawler: A Two Stage Crawler for efficiently harvesting Deep-Web interfaces" IEEE Transactions on Services Computing Volume: 99 PP Year: 2015.
L. Barbosa and J. Freire, "An adaptive crawler for locating hidden web entry points," in Proc. 16th Int. Conf. World Wide Web, 2007, pp. 441–450.
. Olston and M. Najork , "Web Crawling", Foundations and Trends in Information Retrieval, vol. 4, No. 3 ,pp. 175–246, 20.
Y. He, D. Xin, V. Ganti, S. Rajaraman, and N. Shah, "Crawling deep web entity pages," in Proc. 6th ACM Int. Conf. Web Search Data Mining, 2013, pp. 355–364.
Barbosa and J. Freire, "Searching for hidden-web databases,"in Proc. 8th Int. Workshop Web Databases, 2005, pp. 1–6.
Rabia and Sami, Lalitha K. , "Understanding the Deep Web" (2010). Library Philosophy and Practice (e-journal). Paper 364. http://digitalcommons. unl. edu/libphilprac.

Index Terms

Computer Science

Information Sciences

Keywords

Deep Web Ranking Adaptive Learning Two-stage Crawler.