Call for Paper - January 2022 Edition
IJCA solicits original research papers for the January 2022 Edition. Last date of manuscript submission is December 20, 2021. Read More

Technique for Proficiently Yielding Deep-Web Interfaces using Smart Crawler

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2016
Authors:
Devendra Hapase, M. D. Ingle
10.5120/ijca2016910672

Devendra Hapase and M D Ingle. Technique for Proficiently Yielding Deep-Web Interfaces using Smart Crawler. International Journal of Computer Applications 146(4):28-32, July 2016. BibTeX

@article{10.5120/ijca2016910672,
	author = {Devendra Hapase and M. D. Ingle},
	title = {Technique for Proficiently Yielding Deep-Web Interfaces using Smart Crawler},
	journal = {International Journal of Computer Applications},
	issue_date = {July 2016},
	volume = {146},
	number = {4},
	month = {Jul},
	year = {2016},
	issn = {0975-8887},
	pages = {28-32},
	numpages = {5},
	url = {http://www.ijcaonline.org/archives/volume146/number4/25387-2016910672},
	doi = {10.5120/ijca2016910672},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

Now days, world web has most famous because of web as well as internet increased development and its effect is that there are more requirements of the techniques that are used to improve the effectiveness of locating the deep-web interface. A technique called as a web crawler that surfs the World Wide Web in automatic manner. This is also called as Web crawling or spidering. In proposed system, initial phase is Smart Crawler works upon site-based scanning for mediatory pages by implementing search engines. It prevents the traffic that colliding with huge amount of pages. Accurate outcomes are taken due to focus upon crawl. Ranking of websites is done on the basis of arrangements on the basis of the priority valuable individuals and quick in-site finding through designing most suitable links with an adaptive link-ranking. There is always trying to search the deep web databases that doesn’t connected with any of the web search tools. They are continuous insignificantly distributed as well as they are constantly modifying. This issue is overcome by implementing two crawlers such as generic crawlers and focused crawlers. Generic crawlers aggregate every frame that may be found as well as it not concentrate over a particular subject. Focused crawlers such as Form-Focused Crawler (FFC) and Adaptive Crawler for Hidden-web Entries (ACHE) may continuous to find for online databases on a specific subject. FFC is designed to work with connections, pages as well as from classifiers for focused crawling of web forms and it is extended through adding ACHE with more components for filtering and adaptive link learner. This system implements Naive Bayes classifier instead of SVM for searchable structure classifier (SFC) and a domain-specific form classifier (DSFC). Naive Bayes classifiers in machine learning are a bunch of clear probabilistic classifiers determine by implementing Bayes theorem with solid (gullible) freedom assumptions from the components. The proposed system contribute a novel module user login for selection of authorized user who may surf the particular domain on the basis of provided data the client and that is also used for filtering the results. In this system additionally implemented the concept of pre-query as well as post-query. Pre-query works only with the form and with the pages that included it and Post-query is utilizes data collected outcomes from form submissions.

References

  1. Feng Zhao, Jingyu Zhou, Chang Nie, Heqing Huang, Hai Jin, SmartCrawler: A Two-stage Crawler for Efficiently Harvesting Deep-Web Interfaces in IEEE Transactions on Services Computing Volume: PP Year: 2015.
  2. Soumen Chakrabarti, Martin van den Berg, Byron Dom, and Focused crawling: a new approach to topic-specific Web resource discovery, Computer Networks,31(11):16231640, 1999.
  3. Kevin Chen-Chuan Chang, Bin He, Chengkai Li, MiteshPatel, and Zhen Zhang. Structured databases on the web: Observations and implications. ACM SIGMOD Record, 33(3):61-70, 2004.
  4. Soumen Chakrabarti, Kunal Punera and Mallela Subramanyam. Accelerated focused crawling through online relevance feedback. In Proceedings of the 11th international conference on World Wide Web, pages 148-159, 2002.
  5. Sriram Raghavan and Hector Garcia-Molina. Crawling the hidden web. In Proceedings of the 27th International Conference on Very Large Data Bases, pages 129-138, 2000.
  6. Jayant Madhavan, Shawn R. Jeffery, Shirley Cohen, XinDong, David Ko, Cong Yu, andAlon Halevy. Web-scale data integration: You can only afford to pay as you go. In Proceedings of CIDR, pages 342-350, 2007.
  7. Jared Cope, Nick Craswell and David Hawking. Automated discovery of search interfaces on web. In Proceedings of the 14th Australasian database conference- Volume 17, pages 181-189. Australian Computer Society, Inc., 2003.
  8. Thomas Kabisch, Eduard C. Dragut, Clement Yu, and Ulf Leser. Deep web integration with visqi. Proceedings of the VLDB Endowment, 3(1-2):1613-1616,2010

Keywords

Deep web, crawler, feature selection, ranking, adaptive learning, Web resource discovery.