CFP last date
20 May 2024
Reseach Article

Priority based Semantic Web Crawler

by Jaytrilok Choudhary, Devshri Roy
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 81 - Number 15
Year of Publication: 2013
Authors: Jaytrilok Choudhary, Devshri Roy
10.5120/14197-2372

Jaytrilok Choudhary, Devshri Roy . Priority based Semantic Web Crawler. International Journal of Computer Applications. 81, 15 ( November 2013), 10-13. DOI=10.5120/14197-2372

@article{ 10.5120/14197-2372,
author = { Jaytrilok Choudhary, Devshri Roy },
title = { Priority based Semantic Web Crawler },
journal = { International Journal of Computer Applications },
issue_date = { November 2013 },
volume = { 81 },
number = { 15 },
month = { November },
year = { 2013 },
issn = { 0975-8887 },
pages = { 10-13 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume81/number15/14197-2372/ },
doi = { 10.5120/14197-2372 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:56:07.301220+05:30
%A Jaytrilok Choudhary
%A Devshri Roy
%T Priority based Semantic Web Crawler
%J International Journal of Computer Applications
%@ 0975-8887
%V 81
%N 15
%P 10-13
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The Internet has billions of web pages and these web pages are attached to each other using URL(Uniform Resource Allocation). Web crawler is a main module of Search engine that gathers these documents from WWW. Most of the web pages present on Internet are active and changes periodically. Thus, Crawler is required to update these web pages to update database of search engine. In this paper, priority based semantic web crawling algorithm has been proposed. Ontology is used to get semantics of web page during crawling process. Algorithm starts with initial seed URL. The web page at given URL is downloaded from Internet and semantic score is calculated with given topic. The semantic score of unvisited URL is calculated using its Anchor text semantic similarity score, semantic similarity score of web page of unvisited URL with given topic and semantic score of its parent pages. Priority queue is used to store URL and its semantic score instead of simple queue. So, every time priority queue returns higher priority URL to crawl next. The overall performance gain over simple crawler is 88%, over focused crawling is 28% and priority based focused crawler is 6%.

References
  1. Singhal, N. , Dixit, A. and Sharma, A. K. 2010. Design of a Priority Based Frequency Regulated Incremental Crawler. International Journal of Computer Applications, Volume 1, No. 1, PP. 42-47.
  2. Tsoi, Ah C. , Forsali, D. , Gori, M. , Hagenbuchner, M. and Scarselli, F. 2003. A Simple Focused Crawler. WWW 2003: ACM.
  3. Snasel, V. , Moravec, P. and Pokorný, J. 2005. WordNet Ontology Based Model for Web Retrieval. In Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration (WIRI'05).
  4. Gruber, T. R. 1993. A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5, Academic Press Ltd. , PP. 199–220.
  5. Mizoguchi, R. , Vanwelkenhuysen, R. and Iked, M. 1995. Task ontology for reuse of problem solving knowledge. In Proceedings of Towards Very Large Knowledge Bases: Knowledge Building & Knowledge Sharing.
  6. Ganesh, S. , Jayaraj, M. , Kalyan, V. , Murthy, S. and Aghila, G. 2004. Ontology-based Web Crawler. In proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04), IEEE.
  7. Mukhopadhyay, D. , Biswas, A. and Sinha, S. 2010. A New Approach to Design Domain Specific Ontology Based Web Crawler. In proceedings of 10th International Conference on Information Technology, IEEE.
  8. Chen, X. and Zhang, X. 2008. HAWK: A Focused Crawler with Content and Link Analysis. In proceeding of International Conference on e-Business Engineering, IEEE.
  9. Hati, D. , Sahoo, B. , Kumar, A. 2010. Adaptive Focused Crawling Based on Link Analysis. In proceeding of 2nd International Conference on Education Technology and Computer (ICETC), IEEE.
  10. Thenmalar, S. and Geetha, T. V. 2011. Concept based Focused Crawling using Ontology. International Journal of Computer Applications, Volume 26, No. 7, PP. 29-32.
  11. Choudhary, J. and Roy, D. 2013. Priority based Focused Web crawler. International Journal of Computer Engineering and Technology, Vol. 4, No. 4, PP. 163-169.
  12. Salton, B. 1988. Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management Elsevier, Vol. 24, No. 5, PP. 513-523.
  13. Lee, D. L. , Chuang, H. and Seamons, K. 1997. Document Ranking and the Vector-space Model. IEEE Software, Vol. 14, No. 2, PP. 67-75.
  14. Chakrabarti, S. , van den Berg, M. and Dom, B. 1999. Focused crawling: a new approach to topic-specific Web resource discovery. In proceeding of 8th International WWW Conference.
Index Terms

Computer Science
Information Sciences

Keywords

Priority ontology Semantic similarity downloader search engine