CFP last date
20 May 2024
Call for Paper
June Edition
IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper
Know more
Reseach Article

Article:Crawler Indexing using Tree Structure and its Implementation

by Deepika Sharma, Parul Gupta, Dr. A.K. Sharma
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 31 - Number 6
Year of Publication: 2011
Authors: Deepika Sharma, Parul Gupta, Dr. A.K. Sharma
10.5120/3830-5323

Deepika Sharma, Parul Gupta, Dr. A.K. Sharma . Article:Crawler Indexing using Tree Structure and its Implementation. International Journal of Computer Applications. 31, 6 ( October 2011), 34-39. DOI=10.5120/3830-5323

@article{ 10.5120/3830-5323,
author = { Deepika Sharma, Parul Gupta, Dr. A.K. Sharma },
title = { Article:Crawler Indexing using Tree Structure and its Implementation },
journal = { International Journal of Computer Applications },
issue_date = { October 2011 },
volume = { 31 },
number = { 6 },
month = { October },
year = { 2011 },
issn = { 0975-8887 },
pages = { 34-39 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume31/number6/3830-5323/ },
doi = { 10.5120/3830-5323 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:17:27.270068+05:30
%A Deepika Sharma
%A Parul Gupta
%A Dr. A.K. Sharma
%T Article:Crawler Indexing using Tree Structure and its Implementation
%J International Journal of Computer Applications
%@ 0975-8887
%V 31
%N 6
%P 34-39
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The plentiful content of the World-Wide Web is useful to millions. Information seekers use a search engine such as Google, Yahoo etc to begin their Web activity. Our aim is to make a search tool that is cost-effective, efficient, fast and user friendly. In response to a query, it should retrieve the most relevant information which has been stored into the database. It should also be portable, so that it can easily be deployed at any platform without any cost and inconvenience. Our goal is to make a Web Search Engine that will retrieve the best matched WebPages in the shortest possible time. This paper proposes an algorithm for crawler in which crawler crawls the WebPages recursively and stores the relevant data in the database. The algorithm uses the basic principles of tree structure while maintaining the crawled data by the crawler to be used by the search engine. The proposed work makes the searching on the web more efficient. It uses the tree/node structure in the database which filters the searched word more efficiently and gives faster results to the user. The paper has also implemented the crawler indexing with tree structure using HTML based Update File at Web Server’ while making the crawling and searching more efficient.

References
  1. Changshang Zhou, Wei Ding, Na Yang, Double Indexing Mechanism of Search Engine based on Campus Net, Proceedings of the 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06).
  2. Fabrizio Silvestri, Raffaele Perego and Salvatore Orlando. Assigning Document Identifiers to Enhance Compressibility of Web Search Engines Indexes. In the proceedings of SAC, 2004.
  3. Oren Zamir and Oren Etzioni. Web Document Clustering: A feasibility demonstration. In the proceedings of SIGIR, 1998.
  4. A. Jain and R. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988
  5. Berners-Lee, T., Hendler, J. and Lassila, O., “The Semantic Web,” Scientific American.284(5):35-43, 2001.
  6. O. Zamir, O. Etzioni, O. Madanim, and R.M. Karp, “Fast andIntuitive Clustering of Web Documents,” Proc. Third Int’l Conf. Knowledge Discovery and Data Mining, pp. 287-290, Aug. 1997.
  7. Wang Jicheng, Huang Yuan, Wu Gangshan and Zhang Fuyan, ‘Web Mining: Knowledge Discovery on the Web’ ,IEEE (1999).
  8. Frawley, W., Piatetsky-Shapiro, G., and Matheus, C., Knowledge Discovery in Databases: An Overview. Ai Magazine, Vol. 13 (1992), pp.57-70.
  9. Changshang Zhou, Wei Ding, Na Yang, Double Indexing Mechanism of Search Engine based on Campus Net, Proceedings of the 2006 IEEE Asia-Pacific Conference on Services Computing (APSCC'06)
  10. Quan, T. T., Hui, S. C., Fong, A. C. M., and Cao, T. H. (2004). Automatic generation of ontology for scholarly semantic Web. In: Lecture Notes in Computer Science. Vol. 3298. (pp. 726–740).
Index Terms

Computer Science
Information Sciences

Keywords

Crawler Indexing Tree Structure World-Wide Web