CFP last date
20 June 2024
Reseach Article

A Survey: World Wide Web and the Search Engines

Published on February 2013 by Bharat Bhushan Agarwal, Sonia Gupta
International Conference on Advances in Computer Application 2013
Foundation of Computer Science USA
ICACA2013 - Number 1
February 2013
Authors: Bharat Bhushan Agarwal, Sonia Gupta
6282959e-2e36-4796-8fe5-46754a4edf20

Bharat Bhushan Agarwal, Sonia Gupta . A Survey: World Wide Web and the Search Engines. International Conference on Advances in Computer Application 2013. ICACA2013, 1 (February 2013), 40-43.

@article{
author = { Bharat Bhushan Agarwal, Sonia Gupta },
title = { A Survey: World Wide Web and the Search Engines },
journal = { International Conference on Advances in Computer Application 2013 },
issue_date = { February 2013 },
volume = { ICACA2013 },
number = { 1 },
month = { February },
year = { 2013 },
issn = 0975-8887,
pages = { 40-43 },
numpages = 4,
url = { /proceedings/icaca2013/number1/10396-1016/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference on Advances in Computer Application 2013
%A Bharat Bhushan Agarwal
%A Sonia Gupta
%T A Survey: World Wide Web and the Search Engines
%J International Conference on Advances in Computer Application 2013
%@ 0975-8887
%V ICACA2013
%N 1
%P 40-43
%D 2013
%I International Journal of Computer Applications
Abstract

The World Wide Web is one of the most popular and quickly growing aspects of the Internet. Ways in which computer scientists attempt to estimate its size vary from making educated guesses, to performing extensive analyses on search engine databases. We present a new way of measuring the size of the World Wide Web using "Quadrat Counts", a technique used by biologists for population sampling. There has been an exponential growth in hypermedia and web modeling languages in the market. This growth has highlighted new problems and new areas of research. This paper categorizes and reviews the main hypermedia and web modeling languages showing their origin and their primary focus. It then concludes with recommendations for further research in this field. When automatically extracting information from the world wide web, most established methods focus on spotting single HTML documents. However, the problem of spotting complete web sites is not handled adequately yet, in spite of its importance for various applications. Therefore, this paper discusses the classification of complete web sites. First, we point out the main differences to page classification by discussing a very intuitive approach and its weaknesses. This approach treats a web site as one large HTML-document and applies the well-known methods for page classification. Next, we show how accuracy can be improved by employing a preprocessing step which assigns an occurring web page to its most likely topic. The determined topics now represent the information the web site contains and can be used to classify it more accurately. We accomplish this by following two directions. First, we apply well established classification algorithms to a feature space of occurring topics. The second direction treats a site as a tree of occurring topics and uses a Markov tree model for further classification. To improve the efficiency of this approach, we additionally introduce a powerful pruning method reducing the number of considered web pages. Our experiments show the superiority of the Markov tree approach regarding classification accuracy. In particular, we demonstrate that the use of our pruning method not only reduces the processing time, but also improves the classification accuracy.

References
  1. Atzeni, P. et al, 1998. Design and Maintenance of Data-Intensive Web Sites, Proc. EDBT 1998, pp. 436-450.
  2. Bennett, S. et al,1999. Object-Oriented System Analysis and Design. McGraw-Hill Publishing Company, London.
  3. Benyon, D, 1990. Information and Data Modelling. Blackwell Scientific Publications, Oxford.
  4. Bichler, M. and Nusser, S, 1996. Developing structured WWW-sites with W3DT. In: Proceedings of the WebNet – World Conference of The Web Society. October 16-19, San Francisco, CA USA.
  5. Monmarché N. , Nocent G. , Slimane M. and Venturini G. (1999), Imagine: a tool for generating HTML style sheets with an interactive genetic algorithm based on genes frequencies. 1999 IEEE International Conference on Systems, Man, and Cybernetics (SMC'99), Interactive Evolutionary Computation session, October 12-15, 1999, Tokyo, Japan.
  6. Morgan J. J. and Kilgour A. C. (1996), Personalising information retrieval using evolutionary modelling, Proceedings of PolyModel 16: Applications of Artificial Intelligence, ed by A. O. Moscardini a nd P. Smith, 142-149, 1996. Moukas A. (1997), Amalthea: information discovery and filtering using a multiagent evolving ecosystem, AppliedArtificial Intelligence, 11(5):437-457, 1997
Index Terms

Computer Science
Information Sciences

Keywords

Www Size Estimation Using Biological Techniques Modelling Methodologies Web Application Web Site Hypermedia Hypertext