HWPDE: Novel Approach for Data Extraction from Structured Web Pages

Manpreet Singh Sehgal; Anuradha and

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 20 July 2026

Submit your paper

Know more

The week's pick

CAD-Genesis: An Open-Source AI-Powered Add-in for Natural Language-Driven Parametric CAD Modeling and Cross-Platform Integration in SolidWorks and Fusion 360

Anil Mandloi Prakhi Mandloi

Random Articles

Computation (Abacus) Aspects of the Sahasralingam

Jun

2016

Design and Implementation of Photo Voltaic System: Arduino Approach

August

2013

A Review of the Effective Techniques of Compression in Medical Image Processing

July

2014

Performance Comparisons of Novel Feature Vector Selection Methods for Iris Recognition

July

2012

Reseach Article

HWPDE: Novel Approach for Data Extraction from Structured Web Pages

by Manpreet Singh Sehgal, Anuradha and

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 50 - Number 8

Year of Publication: 2012

Authors: Manpreet Singh Sehgal, Anuradha and

10.5120/7791-0897

Manpreet Singh Sehgal, Anuradha and . HWPDE: Novel Approach for Data Extraction from Structured Web Pages. International Journal of Computer Applications. 50, 8 ( July 2012), 22-27. DOI=10.5120/7791-0897

@article{ 10.5120/7791-0897,

author = { Manpreet Singh Sehgal, Anuradha and },

title = { HWPDE: Novel Approach for Data Extraction from Structured Web Pages },

journal = { International Journal of Computer Applications },

issue_date = { July 2012 },

volume = { 50 },

number = { 8 },

month = { July },

year = { 2012 },

issn = { 0975-8887 },

pages = { 22-27 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume50/number8/7791-0897/ },

doi = { 10.5120/7791-0897 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:47:46.250914+05:30

%A Manpreet Singh Sehgal

%A Anuradha and

%T HWPDE: Novel Approach for Data Extraction from Structured Web Pages

%J International Journal of Computer Applications

%@ 0975-8887

%V 50

%N 8

%P 22-27

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Diving into the World Wide Web for the purpose of fetching precious stones (relevant information) is a tedious task under the limitations of current diving equipments (Current Browsers). While a lot of work is being carried out to improve the quality of diving equipments, a related area of research is to devise a novel approach for mining. This paper describes a novel approach to extract the web data from the hidden websites so that it can be used as a free service to a user for a better and improved experience of searching relevant data. Through the proposed method, relevant data (Information) contained in the web pages of hidden websites is extracted by the crawler and stored in the local database so as to build a large repository of structured and indexed and ultimately relevant data. Such kind of extracted data has a potential to optimally satisfy the relevant Information starving end user.

References

The Deep Web: Surfacing Hidden Value. http://www. completeplanet. com/Tutorials/DeepWeb/.
S. Lawrence and C. L. Giles. Searching the World Wide Web. Science, 280(5360):98, 1998.
S. Lawrence and C. L. Giles. Accessibility of information on the web. Nature, 400:107{109, 1999}
Bing Liu, Robert Grossman, and Yanhong Zhai. Mining data records in web pages. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 601–606, New York, NY, USA, 2003. ACM Press.
Ntoulas, A. , Zerfos, P. , Cho, J. Downloading Textual Hidden Web Content Through Keyword Queries. In Proceedings of the 5th ACM/IEEE Joint Conference on Digital Libraries.
Ji Ma; Derong Shen; TieZheng Nie DESP: An Automatic Data Extractor on Deep Web Pages Web Information Systems and Applications Conference (WISA), 2010 7th Publication Year: 2010, Page(s): 132 - 136
Anuradha, A. K Sharma. "Structure based Data Extraction from Hidden Web Sources " Published in International Journal of Computer Applications (0975-8887) Volume 25-No. 3 July 2011 pages 32-37
Cai, D. , Yu, S. , Wen, J. -R. , and Ma, W. -Y. 2003. VIPS: a Vision-based Page Segmentation Algorithm. Tech. Rep. MSR-TR-2003-79, Microsoft Technical Report.
Anuradha, A. K Sharma. "A Novel Technique for data extraction From Hidden Web Databases Published in International Journal of Computer Applications (0975-8887) Volume 15-No. 4 February 2011 pages 45-48
YalinWang and Jianying Hu. A machine learning based approach for table detection on the web. In WWW '02: Proceedings of the 11th international conference on World Wide Web, pages

Index Terms

Computer Science

Information Sciences

Keywords

Hidden Web Web page Extraction Web Page Service