A Survey: World Wide Web and the Search Engines

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 20 July 2026

Submit your paper

Know more

The week's pick

Quantifying Label-Induced Bias in Large Language Model Self and Cross Evaluations

Muskan Saraf Sajjad Rezvani Boroujeni Justin Beaudry Hossein Abedi Tom Bush

Random Articles

On Chain Folding Problems of Chain Mapper and Chain Reducer Meta Expressions

April

2015

A Supervised Approach to Zero-Shot Learning for Field Classification of Texts: Leveraging File Data for Improved Text Categorization

Sep

2024

Optimized kNN Query Processing using Clustering in Untrusted Cloud Environment

April

2015

Development of an Instrument for Enterprise Resource Planning (ERP) Implementation in Indian Small and Medium Enterprises (SMEs)

July

2012

Reseach Article

A Survey: World Wide Web and the Search Engines

Published on February 2013 by Bharat Bhushan Agarwal, Sonia Gupta

International Conference on Advances in Computer Application 2013

Foundation of Computer Science USA

ICACA2013 - Number 1

February 2013

Authors: Bharat Bhushan Agarwal, Sonia Gupta

Bharat Bhushan Agarwal, Sonia Gupta . A Survey: World Wide Web and the Search Engines. International Conference on Advances in Computer Application 2013. ICACA2013, 1 (February 2013), 40-43.

@article{

author = { Bharat Bhushan Agarwal, Sonia Gupta },

title = { A Survey: World Wide Web and the Search Engines },

journal = { International Conference on Advances in Computer Application 2013 },

issue_date = { February 2013 },

volume = { ICACA2013 },

number = { 1 },

month = { February },

year = { 2013 },

issn = 0975-8887,

pages = { 40-43 },

numpages = 4,

url = { /proceedings/icaca2013/number1/10396-1016/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 International Conference on Advances in Computer Application 2013

%A Bharat Bhushan Agarwal

%A Sonia Gupta

%T A Survey: World Wide Web and the Search Engines

%J International Conference on Advances in Computer Application 2013

%@ 0975-8887

%V ICACA2013

%N 1

%P 40-43

%D 2013

%I International Journal of Computer Applications

Abstract

The World Wide Web is one of the most popular and quickly growing aspects of the Internet. Ways in which computer scientists attempt to estimate its size vary from making educated guesses, to performing extensive analyses on search engine databases. We present a new way of measuring the size of the World Wide Web using "Quadrat Counts", a technique used by biologists for population sampling. There has been an exponential growth in hypermedia and web modeling languages in the market. This growth has highlighted new problems and new areas of research. This paper categorizes and reviews the main hypermedia and web modeling languages showing their origin and their primary focus. It then concludes with recommendations for further research in this field. When automatically extracting information from the world wide web, most established methods focus on spotting single HTML documents. However, the problem of spotting complete web sites is not handled adequately yet, in spite of its importance for various applications. Therefore, this paper discusses the classification of complete web sites. First, we point out the main differences to page classification by discussing a very intuitive approach and its weaknesses. This approach treats a web site as one large HTML-document and applies the well-known methods for page classification. Next, we show how accuracy can be improved by employing a preprocessing step which assigns an occurring web page to its most likely topic. The determined topics now represent the information the web site contains and can be used to classify it more accurately. We accomplish this by following two directions. First, we apply well established classification algorithms to a feature space of occurring topics. The second direction treats a site as a tree of occurring topics and uses a Markov tree model for further classification. To improve the efficiency of this approach, we additionally introduce a powerful pruning method reducing the number of considered web pages. Our experiments show the superiority of the Markov tree approach regarding classification accuracy. In particular, we demonstrate that the use of our pruning method not only reduces the processing time, but also improves the classification accuracy.

References

Atzeni, P. et al, 1998. Design and Maintenance of Data-Intensive Web Sites, Proc. EDBT 1998, pp. 436-450.
Bennett, S. et al,1999. Object-Oriented System Analysis and Design. McGraw-Hill Publishing Company, London.
Benyon, D, 1990. Information and Data Modelling. Blackwell Scientific Publications, Oxford.
Bichler, M. and Nusser, S, 1996. Developing structured WWW-sites with W3DT. In: Proceedings of the WebNet â World Conference of The Web Society. October 16-19, San Francisco, CA USA.
MonmarchÃ© N. , Nocent G. , Slimane M. and Venturini G. (1999), Imagine: a tool for generating HTML style sheets with an interactive genetic algorithm based on genes frequencies. 1999 IEEE International Conference on Systems, Man, and Cybernetics (SMC'99), Interactive Evolutionary Computation session, October 12-15, 1999, Tokyo, Japan.
Morgan J. J. and Kilgour A. C. (1996), Personalising information retrieval using evolutionary modelling, Proceedings of PolyModel 16: Applications of Artificial Intelligence, ed by A. O. Moscardini a nd P. Smith, 142-149, 1996. Moukas A. (1997), Amalthea: information discovery and filtering using a multiagent evolving ecosystem, AppliedArtificial Intelligence, 11(5):437-457, 1997

Index Terms

Computer Science

Information Sciences

Keywords

Www Size Estimation Using Biological Techniques Modelling Methodologies Web Application Web Site Hypermedia Hypertext