Call for Paper - January 2020 Edition
IJCA solicits original research papers for the January 2020 Edition. Last date of manuscript submission is December 20, 2019. Read More

An Efficiently harvesting Deep Web Interfaces based on Two Stage Crawler

Print
PDF
IJCA Proceedings on International Conference on Emerging Trends in Computing and Communication
© 2018 by IJCA Journal
ICETCC 2017 - Number 3
Year of Publication: 2018
Authors:
Rohini Navnathkhedkar
Madhuri Dalal

Rohini Navnathkhedkar and Madhuri Dalal. Article: An Efficiently harvesting Deep Web Interfaces based on Two Stage Crawler. IJCA Proceedings on International Conference on Emerging Trends in Computing and Communication ICETCC 2017(3):18-22, June 2018. Full text available. BibTeX

@article{key:article,
	author = {Rohini Navnathkhedkar and Madhuri Dalal},
	title = {Article: An Efficiently harvesting Deep Web Interfaces based on Two Stage Crawler},
	journal = {IJCA Proceedings on International Conference on Emerging Trends in Computing and Communication},
	year = {2018},
	volume = {ICETCC 2017},
	number = {3},
	pages = {18-22},
	month = {June},
	note = {Full text available}
}

Abstract

As deep web grows at a very fast pace, there has been increased interest in techniques that help efficiently locate deep-web interfaces. However, due to the large volume of web resources and the dynamic nature of deep web, achieving wide coverage and high efficiency is a challenging issue. We propose a two-stage framework, for harvesting deep web interfaces. In the first stage of harvesting, performs site-based searching for center pages with the help of search engines, avoiding visiting a large number of pages. To achieve more accurate results for a focused crawl ranks websites to prioritize highly relevant ones for a given topic. In the second stage, it achieves fast in-site searching by excavating most relevant links with an adaptive link-ranking.

References

  • Feng Zhao, Jingyu Zhou, Chang Nie, Heqing Huang, Hai Jin "SmartCrawler: A Two Stage Crawler for efficiently harvesting Deep-Web interfaces" IEEE Transactions on Services Computing Volume: 99 PP Year: 2015.
  • L. Barbosa and J. Freire, "An adaptive crawler for locating hidden web entry points," in Proc. 16th Int. Conf. World Wide Web, 2007, pp. 441–450.
  • . Olston and M. Najork , "Web Crawling", Foundations and Trends in Information Retrieval, vol. 4, No. 3 ,pp. 175–246, 20.
  • Y. He, D. Xin, V. Ganti, S. Rajaraman, and N. Shah, "Crawling deep web entity pages," in Proc. 6th ACM Int. Conf. Web Search Data Mining, 2013, pp. 355–364.
  • Barbosa and J. Freire, "Searching for hidden-web databases,"in Proc. 8th Int. Workshop Web Databases, 2005, pp. 1–6.
  • Rabia and Sami, Lalitha K. , "Understanding the Deep Web" (2010). Library Philosophy and Practice (e-journal). Paper 364. http://digitalcommons. unl. edu/libphilprac.