Call for Paper - May 2020 Edition
IJCA solicits original research papers for the May 2020 Edition. Last date of manuscript submission is April 20, 2020. Read More

Available Challenges and Guidelines in the Field of Deep Web and Intensive Crawling

Print
PDF
International Journal of Computer Applications
© 2013 by IJCA Journal
Volume 77 - Number 1
Year of Publication: 2013
Authors:
Yasin Ezatdoost
Ali Tourani
Amir Seyed Danesh
10.5120/13355-0948

Yasin Ezatdoost, Ali Tourani and Amir Seyed Danesh. Article: Available Challenges and Guidelines in the Field of Deep Web and Intensive Crawling. International Journal of Computer Applications 77(1):1-5, September 2013. Full text available. BibTeX

@article{key:article,
	author = {Yasin Ezatdoost and Ali Tourani and Amir Seyed Danesh},
	title = {Article: Available Challenges and Guidelines in the Field of Deep Web and Intensive Crawling},
	journal = {International Journal of Computer Applications},
	year = {2013},
	volume = {77},
	number = {1},
	pages = {1-5},
	month = {September},
	note = {Full text available}
}

Abstract

Today, there is a great deal of information available in Web world and the only way to access them is through search relationships. Web crawler is an automated script that independently browses the web. Web crawler starts its task with a "seed URL" and then traces links available in each page. This encountered many available crawlers with essential difficulties. Identification of search intermediate and selection of a proper inquiry, on one hand, and retrieving documentaries returned by the web as the result, on the other hand, are issues that intensify challenges available for web crawlers. The aim of the present paper is to investigate available challenges and guidelines in the field of deep web and intensive crawling.

References

  • See http://java. sun. com/products/servlet/ 2006 Java Servlet TM Technology
  • Gravano L. , Iperirotis P. G, Sahami M. 2003 QProber: A system for automatic classification Web databases. In Proceedings of the ACM Trans. Information System pp. 1-14
  • Change K. C. C. , He B. , Li C. , Patel M. , Zhang Z. 2004 Structured databases on the web: Observations and implications. SIGMOD Record
  • Chakrabarti S. , Berg M. V. D. , Dom B. 1999 Focused Crawling: a New Approach to Topic-Specific Web Resource Discovery. In 31th Computer Networks Conference, pp. 1623-1640
  • Chakrabarti S. , Berg M. V. D. , Dom B. 1997 Distributed Hypertext Resource Discovery through Example". In 25th International Conference on Very Large Data Base, USA
  • Cho J. , Garcia-Molina H. 2000 the Evolution of the Web and Implications for an Incremental Crawler. In 26th International Conference on Very Large Data Bases, USA, pp. 200-209
  • Cho J. , Garcia-Molina H. 2000 Synchronizing a Database to Improve Freshness. In ACM SIGMOD International Conference on Management of Data, USA, pp. 117-128
  • Cho J. , Garcia-Molina H. and Page L. 1998 Efficient Crawling through URL Ordering In 7th In World Wide Web Conference, Australia. pp. 161-172
  • Diligenti M. , Coetzee F. , Lawrence S. 2000 Focused Crawling Using Context Graphs. In 26th International Conference on Very Large Databases (VLDB), Cairo, Egypt, pp. 527-534
  • Alvarez M. , Pan A. , Raposo J. and Vina A. 2006 Crawling the client-side hidden web
  • Doorenbos R. B. , Etzioni O. , Weld D. S. 1997 A scalable comparison-shopping agent for the World-Wide Web. In First International Conference on Autonomouse Agent, pp. 39-48
  • Lage J. P. , da Silva A. , Golgher P. B. , Laender A. H. 2004 Automatic generation of agent for collecting hidden web pages for data extraction. Data Knowledge Eng. pp. 177-196
  • Zhang Z. , He B. , Chang K. 2004 Understanding Web query interfaces: best- effort parsing with hidden syntax. In Proceeding of the 2004 ACM SIGMOD international Conference on Management of Data, Paris, France
  • Article on New York Times 2006 Old Search Engine, the Library Tries to Fit Into a Google World. See http://www. nytimes. com/2004/06/21/technology/21LIBR. html
  • Najork M. , Wiener J. 2011 Breadth-First Search Crawling Yields High-Quality Pages. In 10th Conference on Word Wide Web, Hong-Kong. pp. 114- 118
  • Broder A. , Carnel D. 2005 Sampling search-engine results. In 14th international Conference on world Wide Web, Chiba, Japan
  • Qin J. , Chen H. 2005 Using Genetic Algorithm in Building Domain-Specific Collections: An Experiment in the Nanotechnology Domain. In 38th Annual Hawaii International Conference on System Sciences, USA
  • Rennie J. , McCallum A. 1999 Using Reinforcement Learning to Spider the Web Efficiently. In 16th International Conference on Machine Learning, USA, pp. 335-343
  • Rungsawang A. , Angkawattanawit N. 2005 Learnable Topic-Specific WebCrawler. Journal of Network and Computer Applications, UK, pp. 97-114
  • Koster M. 1993 Guidelines for robot writers, http://www. robotstxt. org/guidelines. html,
  • Shkapenyuk V. , Suel T. 2001 Design and Implementation of a High-Performance Distributed Web Crawler. In 18th International Conference on Data Engineering, USA, pp. 357- 368
  • Younes H. , Chabane D. 2004 High Performance Crawling System. In 6th ACM SIGMM International Workshop on Multimedia Information Retrieval, New York, USA, pp. 299-306
  • Gulli A. , Signorini A. 2005 The Index able Web is More than 11. 5 billion pages. In 14th International World Wide Web Conference, Chiba, Japan
  • Gravano L. , Ipeirotis P. G. , Sahami M. 2002 Query- vs. Crawling-based Classification of Searchable Web Databases. IEEE Data Engineering Bulletin
  • Gravano L. , Garcia-Molina H. , Tomasic A. 1999 GIOSS: Text source discovery over the Internet. ACM TODS
  • Ipeirotis P. G. , Gravano L. , Sahami M. 2001 Probe, count, and classify: categorizing hidden web databases. In Proceeding of 2001 ACM SIGMOD, international Conference on Management of Data, Santa Barbara, California, U. S.
  • Ipeirotis P. G. , Gravano L. 2002 Distributed Search over the Hidden web: Hierarchical Database Sampling and Selection. In 28th VLDB Conference, Hong Kong, China
  • Barbosa L. , Freire J. 2004 Siphoning Hidden-Web Data through Keyword-Base Interfaces. In SBBD
  • Castillo C. 2004 Effective Web Crawling. In ACM SIGIR. Vo. 39, Issue 1
  • Kumar Sharma D. 2011 A Novel Architecture for Deep Web Crawler. International Journal of Information Technology and Web Engineering