Call for Paper - November 2022 Edition
IJCA solicits original research papers for the November 2022 Edition. Last date of manuscript submission is October 20, 2022. Read More

HWPDE: Novel Approach for Data Extraction from Structured Web Pages

Print
PDF
International Journal of Computer Applications
© 2012 by IJCA Journal
Volume 50 - Number 8
Year of Publication: 2012
Authors:
Manpreet Singh Sehgal
Anuradha
10.5120/7791-0897

Manpreet Singh Sehgal and Anuradha and. Article: HWPDE: Novel Approach for Data Extraction from Structured Web Pages. International Journal of Computer Applications 50(8):22-27, July 2012. Full text available. BibTeX

@article{key:article,
	author = {Manpreet Singh Sehgal and Anuradha and},
	title = {Article: HWPDE: Novel Approach for Data Extraction from Structured Web Pages},
	journal = {International Journal of Computer Applications},
	year = {2012},
	volume = {50},
	number = {8},
	pages = {22-27},
	month = {July},
	note = {Full text available}
}

Abstract

Diving into the World Wide Web for the purpose of fetching precious stones (relevant information) is a tedious task under the limitations of current diving equipments (Current Browsers). While a lot of work is being carried out to improve the quality of diving equipments, a related area of research is to devise a novel approach for mining. This paper describes a novel approach to extract the web data from the hidden websites so that it can be used as a free service to a user for a better and improved experience of searching relevant data. Through the proposed method, relevant data (Information) contained in the web pages of hidden websites is extracted by the crawler and stored in the local database so as to build a large repository of structured and indexed and ultimately relevant data. Such kind of extracted data has a potential to optimally satisfy the relevant Information starving end user.

References

  • The Deep Web: Surfacing Hidden Value. http://www. completeplanet. com/Tutorials/DeepWeb/.
  • S. Lawrence and C. L. Giles. Searching the World Wide Web. Science, 280(5360):98, 1998.
  • S. Lawrence and C. L. Giles. Accessibility of information on the web. Nature, 400:107{109, 1999}
  • Bing Liu, Robert Grossman, and Yanhong Zhai. Mining data records in web pages. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 601–606, New York, NY, USA, 2003. ACM Press.
  • Ntoulas, A. , Zerfos, P. , Cho, J. Downloading Textual Hidden Web Content Through Keyword Queries. In Proceedings of the 5th ACM/IEEE Joint Conference on Digital Libraries.
  • Ji Ma; Derong Shen; TieZheng Nie DESP: An Automatic Data Extractor on Deep Web Pages Web Information Systems and Applications Conference (WISA), 2010 7th Publication Year: 2010, Page(s): 132 - 136
  • Anuradha, A. K Sharma. "Structure based Data Extraction from Hidden Web Sources " Published in International Journal of Computer Applications (0975-8887) Volume 25-No. 3 July 2011 pages 32-37
  • Cai, D. , Yu, S. , Wen, J. -R. , and Ma, W. -Y. 2003. VIPS: a Vision-based Page Segmentation Algorithm. Tech. Rep. MSR-TR-2003-79, Microsoft Technical Report.
  • Anuradha, A. K Sharma. "A Novel Technique for data extraction From Hidden Web Databases Published in International Journal of Computer Applications (0975-8887) Volume 15-No. 4 February 2011 pages 45-48
  • YalinWang and Jianying Hu. A machine learning based approach for table detection on the web. In WWW '02: Proceedings of the 11th international conference on World Wide Web, pages