CFP last date
20 May 2024
Reseach Article

Restructuring robots.txt for better Information Retrieval

by Bhavin M. Jasani, C. K. Kumbharana
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 120 - Number 9
Year of Publication: 2015
Authors: Bhavin M. Jasani, C. K. Kumbharana
10.5120/21258-4115

Bhavin M. Jasani, C. K. Kumbharana . Restructuring robots.txt for better Information Retrieval. International Journal of Computer Applications. 120, 9 ( June 2015), 35-40. DOI=10.5120/21258-4115

@article{ 10.5120/21258-4115,
author = { Bhavin M. Jasani, C. K. Kumbharana },
title = { Restructuring robots.txt for better Information Retrieval },
journal = { International Journal of Computer Applications },
issue_date = { June 2015 },
volume = { 120 },
number = { 9 },
month = { June },
year = { 2015 },
issn = { 0975-8887 },
pages = { 35-40 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume120/number9/21258-4115/ },
doi = { 10.5120/21258-4115 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:05:49.150704+05:30
%A Bhavin M. Jasani
%A C. K. Kumbharana
%T Restructuring robots.txt for better Information Retrieval
%J International Journal of Computer Applications
%@ 0975-8887
%V 120
%N 9
%P 35-40
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Now a days the users of the WWW are not only the human. There are other users or visitors like web crawlers and robots which are generated by the search engines or information retrievers. The direct visitors of your website are very less than those who reach to your website by using search engines or through other links. To collect information from your website search engines use crawlers or robots to access your website. There must be an access mechanism or protocol for such robots which restrict them to access unwanted content of the website. robots. txt is a partial mechanism for such facilities but not fully functional. This paper gives an enhancements to fully make use of the functionality of robots. txt file.

References
  1. A Standard for Robot Exclusion: http://www. robotstxt. org/orig. html
  2. Standard for the Format of ARPA Internet Text Messages: https://www. ietf. org/rfc/rfc0822. txt
  3. The Web Robots Pages. http://www. robotstxt. org/
  4. W3C http://www. w3. org/
  5. Timestamp http://en. wikipedia. org/wiki/Timestamp
  6. Backus–Naur Form http://en. wikipedia. org/wiki/Backus%E2%80%93Naur_form
Index Terms

Computer Science
Information Sciences

Keywords

Crawling agents robots spammer harvesters User Agent tag Directive Overriding Web Crawling Web Tree Web Spamming Crawling Querying.