CFP last date
22 April 2024
Reseach Article

Deep Web Mining: A Gold Mine

Published on August 2011 by Tejaswini A. Bhosale, Priya B. Pandharbale
journal_cover_thumbnail
National Technical Symposium on Advancements in Computing Technologies
Foundation of Computer Science USA
NTSACT - Number 4
August 2011
Authors: Tejaswini A. Bhosale, Priya B. Pandharbale
4e1e3a78-fcba-4f32-845d-e4df1cde7b6f

Tejaswini A. Bhosale, Priya B. Pandharbale . Deep Web Mining: A Gold Mine. National Technical Symposium on Advancements in Computing Technologies. NTSACT, 4 (August 2011), 6-11.

@article{
author = { Tejaswini A. Bhosale, Priya B. Pandharbale },
title = { Deep Web Mining: A Gold Mine },
journal = { National Technical Symposium on Advancements in Computing Technologies },
issue_date = { August 2011 },
volume = { NTSACT },
number = { 4 },
month = { August },
year = { 2011 },
issn = 0975-8887,
pages = { 6-11 },
numpages = 6,
url = { /proceedings/ntsact/number4/3210-ntst031/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 National Technical Symposium on Advancements in Computing Technologies
%A Tejaswini A. Bhosale
%A Priya B. Pandharbale
%T Deep Web Mining: A Gold Mine
%J National Technical Symposium on Advancements in Computing Technologies
%@ 0975-8887
%V NTSACT
%N 4
%P 6-11
%D 2011
%I International Journal of Computer Applications
Abstract

DEEP Web contains very large and valuable information than the surface Web. However, making use of such consolidated information requires substantial efforts since the pages are generated for visualization not for data exchange. Thus, Extracting structured data from deep Web pages is a challenging problem due to the underlying intricate structures of such pages. So, extracting information from searchable Websites has been a key step for Web information integration. We discuss some of the underlying problems and issues central to extending information retrieval systems.

References
  1. B. Amento, L.G. Terveen, and W.C. Hill, “Does ‘Authority’ Mean Quality? Predicting Expert Quality Ratings of Web Documents,” Proc. ACM SIGIR ’00, July 2000.
  2. J.M. Kleinberg, “Authoritative Sources in a Hyperlinked Environment,”J. ACM, vol. 46, no. 5, pp. 604–632, 1999.
  3. A. Borodin, G.O. Roberts, J.S. Rosenthal, and P. Tsaparas, “Link Analysis Ranking: Algorithms, Theory, and Experiments,” ACMTrans. Internet Technology, vol. 5, no. 1, pp. 231–297, 2005.
  4. T. Mandl, “Implementation and Evaluation of a Quality-Based Search Engine,” Proc. 17th ACM Conf. Hypertext and Hypermedia, Aug. 2006.
  5. S. Lawrence and C.L. Giles, "Accessibility of Information on the Web," Nature 400:107–109, July 8, 1999.
  6. Google Image, images.google.com, 2008.
  7. A. Ghoshal et al., “Hidden Markov Models for Automatic Annotation and Content-Based Retrieval of Images and Video,” Proc. 28th Ann. Int’l ACM SIGIR Conf. Research and Development inInformation Retrieval, pp. 544-551, 2005.
  8. B.T. Li, K. Goh, and E. Chang, “Confidence-Based Dynamic Ensemble for Image Annotation and Semantics Discovery,” Proc.
  9. Huskysearch. Available on the World Wide Web at: http:// zhadum.cs.washington.edu/ACM Int’l Conf. Multimedia, pp. 195-206, 2003.
  10. http://www.tcp.ca/Jan96/BusandMark.html. [formerlyhttp://www.tcp.ca/Jan96/BusandMark.html]
  11. Brin, S.; Motwani, R.; Page, L.; Winograd, T.: The PageRank CitationRanking: Bringing Order to the Web. Technical Report, 1998.
  12. BrightPlanet, LexiBot Pro v. 2.1 User's Manual, April 2000, 126 p.
  13. AccessLogAnalyzers,[http://www.uu.se/Software/Analyzers/Ac cessanalyzers.Html]
  14. The 1999 NEC study report on average Web document
Index Terms

Computer Science
Information Sciences

Keywords

Deep Web Surface web Web-mining