CFP last date
20 May 2024
Reseach Article

A Novel Technique for Data Extraction from Hidden Web Databases

by Anuradha, A.K.Sharma
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 15 - Number 4
Year of Publication: 2011
Authors: Anuradha, A.K.Sharma
10.5120/1933-2579

Anuradha, A.K.Sharma . A Novel Technique for Data Extraction from Hidden Web Databases. International Journal of Computer Applications. 15, 4 ( February 2011), 45-48. DOI=10.5120/1933-2579

@article{ 10.5120/1933-2579,
author = { Anuradha, A.K.Sharma },
title = { A Novel Technique for Data Extraction from Hidden Web Databases },
journal = { International Journal of Computer Applications },
issue_date = { February 2011 },
volume = { 15 },
number = { 4 },
month = { February },
year = { 2011 },
issn = { 0975-8887 },
pages = { 45-48 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume15/number4/1933-2579/ },
doi = { 10.5120/1933-2579 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:03:18.801996+05:30
%A Anuradha
%A A.K.Sharma
%T A Novel Technique for Data Extraction from Hidden Web Databases
%J International Journal of Computer Applications
%@ 0975-8887
%V 15
%N 4
%P 45-48
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The large amount of information on web is stored in backend databases which are not indexed by traditional search engines. Such databases are referred to as Hidden web databases and extraction of this hidden web content is a potential research area as the pages are dynamically created through search query interfaces. However, direct query through this search interface is laborious way to search. Hence, there has been increased interest in retrieval and integration of hidden web data with a view to give high quality information to the web user. This paper proposes a novel approach that identifies Web page templates and the tag structures of a document in order to extract structured data from hidden web sources as the results returned in response to a user query are typically presented using template generated Web pages.

References
  1. Anuradha, A. K. Sharma, Komal Kumar Bhatia: “Optimized Merging of Query Interfaces for Domain-specific Hidden Web “Proc. Third International Conference on Advanced Computing and communication Technologies (ICACCT 2008) Volume 2, No. 2, pp. 196-199
  2. Anuradha, A.K.Sharma, “A Novel Approach for Automatic Detection and Unification of Web Search Query Interfaces using Domain Ontology” selected in International Journal of Information Technology and knowledge management(IJITKM), August 2009.
  3. Jian Qiu, Feng Shao, Misha Zatsman, Jayavel Shanmugasundaram, Index Structures for Querying the Deep Web, Workshop on the Web and Databases (WebDB), 2003, 79-86
  4. BrightPlanet Corp. “The deep web: surfacing hidden value.”
  5. D. Florescu, A.Y. Levy, and A.O. Mendelzon. “Database techniques for the world-wide web: a survey,” SIGMOD Record 27(3), 59-74, 1998.
  6. J. Wang and F. Lochovsky. “Wrapper Induction based on Nested Pattern Discovery,” Technical Report HKUST-CS-27-02, Dept. of Computer Science, Hong Kong U. of Science & Technology, 2002 (submitted for publication).
  7. C.H. Chang, and S.C. Lui. “IEPAD: information extraction based on pattern discovery,” Proc. 10th World Wide Web Conf. 681-688, 2001.
  8. V. Crescenzi, G. Mecca and P. Merialdo. “ROADRUNNER: towards automatic data extraction from large web sites,” Proc. 27th Intl.Conf. on Very Large Data Bases, 109-118, 2001.
  9. D. Embley, Y. Jiang and Y.K. Ng. “Recordboundary discovery in web documents,” Proc. ACM SIGMOD Conf., 467-478, 1999.
  10. He, K. Chang, and J. Han. Discovering complex matchings across web query interfaces: A correlation mining approach. In SIGKDD, 2004.
  11. S. Raghavan and H. Garcia-Molina. Crawling the Hidden Web. In Proceedings of VLDB, pages 129–138, 2001.
  12. B. He and K. C.-C. Chang. Statistical schema matching across web query interfaces. In Proceedings of SIGMOD, pages 217–228, 2003.
  13. S. Lawrence and C. L. Giles. Searching the World Wide Web. Science, 280(5360):98–100, 1998.
  14. L. Barbosa and J. Freire. Siphoning hidden-web data through keyword-based interfaces. In SBBD, 2004.
  15. Ntoulas, A., Zerfos, P., Cho, J. Downloading Textual Hidden Web Content Through Keyword Queries. In Proceedings of the 5th ACM/IEEE Joint Conference on Digital Libraries (JCDL05). 2005.
  16. L. Barbosa and J. Freire. Siphoning hidden-web data through keyword-based interfaces. In SBBD, 2004.
  17. L. Gravano, H. Garcia-Molina, A. Tomasic, "GlOSS: Text-Source Discovery over Internet", TODS 24(2), 1999.
  18. J. Naughton et al, “The Niagara Internet Query System”, IEEE Data Eng. Bulletin, 24(2), 2001.
  19. Brin, Sergey and Page Lawrence. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, April 1998
  20. http://www.computeruser.com/news/00/11/1 8/news6.html.
Index Terms

Computer Science
Information Sciences

Keywords

Hidden web Deep web Global interface Hidden web crawlers Surface web