A Survey of Automatic Deep Web Classification Techniques

Umara Noor; Zahid Rashid; Azhar Rauf

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

Navigating the Future of Cybersecurity: A Strategic Approach to Crypto Agility for Modern Enterprises

Aditya Gupta

Random Articles

Characterization of Angular Error in Magnetic Head Tracking

July

2013

Design and Implementation of a Wireless Gesture Controlled Robotic Arm with Vision

October

2013

A Survey on Security in Medical Image Communication

September

2011

Application Specific Cache Simulation Analysis for Application Specific Instructionset Processor

March

2014

Reseach Article

A Survey of Automatic Deep Web Classification Techniques

by Umara Noor, Zahid Rashid, Azhar Rauf

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 19 - Number 6

Year of Publication: 2011

Authors: Umara Noor, Zahid Rashid, Azhar Rauf

10.5120/2362-3099

Umara Noor, Zahid Rashid, Azhar Rauf . A Survey of Automatic Deep Web Classification Techniques. International Journal of Computer Applications. 19, 6 ( April 2011), 43-50. DOI=10.5120/2362-3099

@article{ 10.5120/2362-3099,

author = { Umara Noor, Zahid Rashid, Azhar Rauf },

title = { A Survey of Automatic Deep Web Classification Techniques },

journal = { International Journal of Computer Applications },

issue_date = { April 2011 },

volume = { 19 },

number = { 6 },

month = { April },

year = { 2011 },

issn = { 0975-8887 },

pages = { 43-50 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume19/number6/2362-3099/ },

doi = { 10.5120/2362-3099 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:06:19.162314+05:30

%A Umara Noor

%A Zahid Rashid

%A Azhar Rauf

%T A Survey of Automatic Deep Web Classification Techniques

%J International Journal of Computer Applications

%@ 0975-8887

%V 19

%N 6

%P 43-50

%D 2011

%I Foundation of Computer Science (FCS), NY, USA

Abstract

To devise vision of the next generation of the web, deep web technologies have gained larger attention in a last few years. An eminent feature of next generation of web is the automation of tasks. A large part of Deep web comprises of online structured domain specific databases that are accessed using web query interfaces. The information contained in these databases is related to a particular domain. This highly relevant information is more suitable for satisfying the information needs of the users and large scale deep web integration. In order to make this extraction and integration process easier, it is necessary to classify the deep web databases into standard\ non-standard category domains. There are mainly two types of classification techniques i.e. manual and automatic. As the size of deep web is increasing at an exponential rate with the passage of time, it has become nearly impossible to classify these deep web search sources manually into their respective domains. For this purpose, several automatic deep web classification techniques have been proposed in the literature. In this paper apart from the literature survey, we propose a framework for analysis of automatic classification techniques of deep web. The framework provides a baseline for the analysis of rudiments of automatic classification techniques based on the parameters such as structured, unstructured, simple/advance query forms, content representative extraction methodology, level of classification, performance evaluation criteria and its results. Furthermore, we studied a number of automatic deep web classification techniques in the light of proposed framework.

References

K. C.-C. Chang, B. He, C. Li, M. Patel, and Z. Zhang. Structured databases on the web: Observations and implications. SIGMOD Record, 33(3):61–70, Sept. 2004.
B. He, T. Tao, and K. C.-C. Chang. "Organizing structured web sources by query schemas: a clustering approach," Proc. Of Conference on Information and Knowledge Management (CIKM 04), ACM Press, 2004, pp.22--31.
Deep web search directory service: http://www.completeplanet.com.
Deep web search directory service: http://www.invisibleweb.com.
Wikipedia: http://en.wikipedia.org/wiki/Deep_Web
BrightPlanet.com. The deep web: Surfacing hidden value. Accessible at http://brightplanet.com, July 2000.
Barbosa, L., Freire, J., Silva, A. "Organizing hidden-web databases by clustering visible web documents," Proc. of IEEE 23rd International Conference on on Data Engineering (ICDE 07), IEEE Press, 2007, pp.326--335.
L. Gravano, P. G. Ipeirotis, and M. Sahami. QProber: A system for automatic classification of hidden-Web databases. ACM TOIS, 21(1):1–41, 2003.
Panagiotis G. Ipeirotis , Luis Gravano , Mehran Sahami, Automatic Classification of Text Databases Through Query Probing, Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases, p.245-255, May 18-19, 2000
B. He, M. Patel, Z. Zhang, and K. C.-C. Chang. Accessing the Deep Web: A survey. Communications of the ACM, 50(5):95–101, 2007.
H. Xu, X. Hau, S. Wang, Y. Hu: A method of Deep Web Classification. Proceedings of sixth international Conference on Machine Learning and Cybernetics, Hong Kong, 19-22 August 2007.
X. Xian, P. Zhao, W. Fang, J. Xin, Z. Cui: Automatic Classification of Deep Web Databases with Simple Query Interfaces. International Conference on Industrial Machatronics and Automation (ICIMA). 2009.
W. Su, J. Wang, F. Lochovsky: Automatic Hierarchical Classification of Structured Deep Web Databases. WISE 2006, LNCS 4255, pp 210-221.
Tiezheng Nie, Derong Shen, Ge Yu, Yue Kou: Subject-Oriented Classification Based on Scale Probing in the Deep Web. WAIM 2008: 224-229
B. He and K. C. -C. Chang. Statistical schema matching across web query interfaces. SIGMOD Conference, 2003.
A helpful guide to search engines: http://www.searchengineguide.com/
Peiguang Lin, Yibing Du, Xiaohua Tan, Chao Lv: “Research on Automatic Classification for Deep Web Query Interfaces”, Intl. Symp. on Information Processing (ISIP), Moscow, pp. 313-317, May 2008.
Hieu Quang Le, Stefan Conrad: Classifying Structured Web Sources Using Support Vector Machine and Aggressive Feature Selection. Lecture Notes in Business Information Processing, 2010, Volume 45, IV, 270-282.
Pengpeng Zhao, Li Huang, Wei Fang and Zhiming Cui: Organizing Structured Deep Web by Clustering Query Interfaces Link Graph. Lecture Notes in Computer Science, 2008, Volume 5139/2008, 683-690.

Index Terms

Computer Science

Information Sciences

Keywords

Deep web web databases data integration domain concepts Survey