Call for Paper - September 2020 Edition
IJCA solicits original research papers for the September 2020 Edition. Last date of manuscript submission is August 20, 2020. Read More

Enhance Crawler: A Dual-Stage Crawler for Efficiently Harvesting Deep Web Interfaces

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2017
Authors:
Sujata R. Gutte, Shubhangi S. Gujar
10.5120/ijca2017914483

Sujata R Gutte and Shubhangi S Gujar. Enhance Crawler: A Dual-Stage Crawler for Efficiently Harvesting Deep Web Interfaces. International Journal of Computer Applications 168(8):23-26, June 2017. BibTeX

@article{10.5120/ijca2017914483,
	author = {Sujata R. Gutte and Shubhangi S. Gujar},
	title = {Enhance Crawler: A Dual-Stage Crawler for Efficiently Harvesting Deep Web Interfaces},
	journal = {International Journal of Computer Applications},
	issue_date = {June 2017},
	volume = {168},
	number = {8},
	month = {Jun},
	year = {2017},
	issn = {0975-8887},
	pages = {23-26},
	numpages = {4},
	url = {http://www.ijcaonline.org/archives/volume168/number8/27896-2017914483},
	doi = {10.5120/ijca2017914483},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

Internet is became important part of our day to day life. That’s why due to heavy usage of internet very large amount of diverse data is spread over it and which provide access to search particular data. Very challenging issue for search engine is ‘fetch most relevant data as per users need. So to reduce large amount of time spend on searching most relevant data as per user’s need. We proposed the “enhanced crawler “In this proposed approach framework is divided into two stages. in the first stage , for center pages search engine perform site based searching for getting more accurate result of focus crawler (it avoid to visit large no of pages ) And ranking is used for prioritize highly relevant ones for given input topic. IN second stage of framework, In-site searching is done by extracting most relevant links with an adaptive link ranking. We design link tree data structure to achieve wider coverage of deep website.

References

  1. Peter Lyman and Hal R. Varian. How much information? 2003. Technical report, UC Berkeley, 2003.
  2. Roger E. Bohn and James E. Short. How much information? 2009 report on American consumers. Technical report, University of California, San Diego, 2009.
  3. Idc worldwide predictions 2014: Battles for dominance – and survival – on the 3rd platform. http://www.idc.com/ research/Predictions14/index.jsp, 2014.
  4. Michael K. Bergman. White paper: The deep web: Surfacing hidden value. Journal of electronic publishing, 7(1), 2001.
  5. Luciano Barbosa and Juliana Freire. Searching for hidden-web databases. In WebDB, pages 1–6, 2005.
  6. Balakrishnan Raju and Kambhampati Subbarao. Sourcerank: Relevance and trustassessment for deep web sourcesbased on inter-source agreement. In Proceedings of the 20th international conference on World Wide Web, pages 227–236, 2011.
  7. Sriram Raghavan and Hector Garcia-Molina. Crawling the hidden web. In Proceedings of the 27th International Conference on Very Large Data Bases, pages 129–138, 2000.
  8. Dumais Susan and Chen Hao. Hierarchical classification of Web content. In Proceedings of the 23rd Annual International ACM SIGIR conference on Research and Development in Information Retrieval, pages 256–263, Athens Greece, 2000
  9. Pages Wensheng Wu, Clement Yu, AnHai Doan, and Weiyi Meng. An interactive clustering-based approach to integrating source query interfaces on the deep web. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data, pages 95–106. ACM, 2004.
  10. Cheng Sheng, Nan Zhang, Yufei Tao, and Xin Jin. Optimal algorithms for crawling a hidden database in the web. Proceedings of the VLDB Endowment, 5(11):1112–1123, 2012.
  11. Panagiotis G Ipeirotis and Luis Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In Proceedings of the 28th international conference on Very Large Data Bases, pages 394–405. VLDB Endowment, 2002.
  12. Mohamamdreza Khelghati, Djoerd Hiemstra, and Maurice Van Keulen. Deep web entity monitoring. In Proceedings of the 22nd international conference on World Wide Web companion, pages 377–382. International World Wide Web Conferences Steering Committee, 2013.

Keywords

Enhance crawler, deep website, ranking, adaptive learning.