A Novel Technique for Data Extraction from Hidden Web Databases

Anuradha; A.K.Sharma

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

A Novel Technique for Data Extraction from Hidden Web Databases

by Anuradha, A.K.Sharma

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 15 - Number 4

Year of Publication: 2011

Authors: Anuradha, A.K.Sharma

10.5120/1933-2579

Anuradha, A.K.Sharma . A Novel Technique for Data Extraction from Hidden Web Databases. International Journal of Computer Applications. 15, 4 ( February 2011), 45-48. DOI=10.5120/1933-2579

@article{ 10.5120/1933-2579,

author = { Anuradha, A.K.Sharma },

title = { A Novel Technique for Data Extraction from Hidden Web Databases },

journal = { International Journal of Computer Applications },

issue_date = { February 2011 },

volume = { 15 },

number = { 4 },

month = { February },

year = { 2011 },

issn = { 0975-8887 },

pages = { 45-48 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume15/number4/1933-2579/ },

doi = { 10.5120/1933-2579 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:03:18.801996+05:30

%A Anuradha

%A A.K.Sharma

%T A Novel Technique for Data Extraction from Hidden Web Databases

%J International Journal of Computer Applications

%@ 0975-8887

%V 15

%N 4

%P 45-48

%D 2011

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The large amount of information on web is stored in backend databases which are not indexed by traditional search engines. Such databases are referred to as Hidden web databases and extraction of this hidden web content is a potential research area as the pages are dynamically created through search query interfaces. However, direct query through this search interface is laborious way to search. Hence, there has been increased interest in retrieval and integration of hidden web data with a view to give high quality information to the web user. This paper proposes a novel approach that identifies Web page templates and the tag structures of a document in order to extract structured data from hidden web sources as the results returned in response to a user query are typically presented using template generated Web pages.

References

Anuradha, A. K. Sharma, Komal Kumar Bhatia: “Optimized Merging of Query Interfaces for Domain-specific Hidden Web “Proc. Third International Conference on Advanced Computing and communication Technologies (ICACCT 2008) Volume 2, No. 2, pp. 196-199
Anuradha, A.K.Sharma, “A Novel Approach for Automatic Detection and Unification of Web Search Query Interfaces using Domain Ontology” selected in International Journal of Information Technology and knowledge management(IJITKM), August 2009.
Jian Qiu, Feng Shao, Misha Zatsman, Jayavel Shanmugasundaram, Index Structures for Querying the Deep Web, Workshop on the Web and Databases (WebDB), 2003, 79-86
BrightPlanet Corp. “The deep web: surfacing hidden value.”
D. Florescu, A.Y. Levy, and A.O. Mendelzon. “Database techniques for the world-wide web: a survey,” SIGMOD Record 27(3), 59-74, 1998.
J. Wang and F. Lochovsky. “Wrapper Induction based on Nested Pattern Discovery,” Technical Report HKUST-CS-27-02, Dept. of Computer Science, Hong Kong U. of Science & Technology, 2002 (submitted for publication).
C.H. Chang, and S.C. Lui. “IEPAD: information extraction based on pattern discovery,” Proc. 10th World Wide Web Conf. 681-688, 2001.
V. Crescenzi, G. Mecca and P. Merialdo. “ROADRUNNER: towards automatic data extraction from large web sites,” Proc. 27th Intl.Conf. on Very Large Data Bases, 109-118, 2001.
D. Embley, Y. Jiang and Y.K. Ng. “Recordboundary discovery in web documents,” Proc. ACM SIGMOD Conf., 467-478, 1999.
He, K. Chang, and J. Han. Discovering complex matchings across web query interfaces: A correlation mining approach. In SIGKDD, 2004.
S. Raghavan and H. Garcia-Molina. Crawling the Hidden Web. In Proceedings of VLDB, pages 129–138, 2001.
B. He and K. C.-C. Chang. Statistical schema matching across web query interfaces. In Proceedings of SIGMOD, pages 217–228, 2003.
S. Lawrence and C. L. Giles. Searching the World Wide Web. Science, 280(5360):98–100, 1998.
L. Barbosa and J. Freire. Siphoning hidden-web data through keyword-based interfaces. In SBBD, 2004.
Ntoulas, A., Zerfos, P., Cho, J. Downloading Textual Hidden Web Content Through Keyword Queries. In Proceedings of the 5th ACM/IEEE Joint Conference on Digital Libraries (JCDL05). 2005.
L. Barbosa and J. Freire. Siphoning hidden-web data through keyword-based interfaces. In SBBD, 2004.
L. Gravano, H. Garcia-Molina, A. Tomasic, "GlOSS: Text-Source Discovery over Internet", TODS 24(2), 1999.
J. Naughton et al, “The Niagara Internet Query System”, IEEE Data Eng. Bulletin, 24(2), 2001.
Brin, Sergey and Page Lawrence. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, April 1998
http://www.computeruser.com/news/00/11/1 8/news6.html.

Index Terms

Computer Science

Information Sciences

Keywords

Hidden web Deep web Global interface Hidden web crawlers Surface web