CFP last date
20 June 2024
Reseach Article

Design and Implementation of Hidden based Web Retrieval using Innovative Vision-based Segmentation

by Kopal Maheshwari, Namrata Tapaswi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 98 - Number 9
Year of Publication: 2014
Authors: Kopal Maheshwari, Namrata Tapaswi
10.5120/17215-7448

Kopal Maheshwari, Namrata Tapaswi . Design and Implementation of Hidden based Web Retrieval using Innovative Vision-based Segmentation. International Journal of Computer Applications. 98, 9 ( July 2014), 42-47. DOI=10.5120/17215-7448

@article{ 10.5120/17215-7448,
author = { Kopal Maheshwari, Namrata Tapaswi },
title = { Design and Implementation of Hidden based Web Retrieval using Innovative Vision-based Segmentation },
journal = { International Journal of Computer Applications },
issue_date = { July 2014 },
volume = { 98 },
number = { 9 },
month = { July },
year = { 2014 },
issn = { 0975-8887 },
pages = { 42-47 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume98/number9/17215-7448/ },
doi = { 10.5120/17215-7448 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:25:48.474676+05:30
%A Kopal Maheshwari
%A Namrata Tapaswi
%T Design and Implementation of Hidden based Web Retrieval using Innovative Vision-based Segmentation
%J International Journal of Computer Applications
%@ 0975-8887
%V 98
%N 9
%P 42-47
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

We assimilate the extracted information from a conference website to acquire the clean and high superiority academic data. This research has subsequent contributors: We propose a novel vision-based page segmentation algorithm, which use DOM tree to compensate the information loss of classical vision-based segmentation algorithm. We transform the conference Web material extraction which is difficult into a classification problematic, and categorize text blocks as predefined sets permitting to vision, key disputes, text and content information. We improve the classification quality by post-processing. Our experimental results on real-world datasets shows that our method is highly effective and efficient for extracting academic information from conference pages.

References
  1. CaiDongdong, CaiDongdong, Zhang Tianrui and Wang Xiao 2012 , "Research of After-sales Service Management System Based on Web" International Conference on System Science and Engineering June 30-July 2, 2012, Dalian, China.
  2. Radhouane Boughammoura and LobnaHlaoua, Mohamed NazihOmri, "VIQI: A New Approach for Visual Interpretation of Deep Web Query Interfaces", Computing Technology and Information Management (ICCM), 8th International Conference on (Volume: 1) 24-26 April 2012.
  3. Raghavan, S. and Garcia-Molina, H. 2001, "Crawling the Hidden Web", VLDB Conference presentation 129 – 138.
  4. Rekha Jain and Dr. G. N. Purohit Department of Computer Science, Apaji Institute, Banasthali University, "Page Ranking Algorithms for Web Mining", International Journal of Computer Applications (0975 - 8887) Volume 13- No. 5, January 2011.
  5. Wei Liu, XiaofengMeng and WeiyiMeng, "ViDE: A Vision-Based Approach for Deep Web Data Extraction", IEEE Transactions On Knowledge And Data Engineering", VOL. 22, 2010.
  6. Jayant Madhavan, David Ko, ?ucjaKot, VigneshGanapathy, Alex Rasmussen and Alon Halevy. "Google's DeepWeb Crawl". PVLDB '08, August 23-28, 2008, Auckland, New Zealand.
  7. Gang Liu, Kai Liu, Yuan-yuan Dang, "Research on discovering Deep web entries Based ontopic crawling and ontology" 978-1-4244-8165-1/11 IEEE -2011.
  8. Chelsea Hicks, Matthew Scheffer, Anne H. H. Ngu and Quan Z. Sheng", Discovery and Cataloging of Deep Web Sources" IEEE IRI 2012, August 8-10, 2012.
  9. Zilu Cui and Yuchen Fu, "Deep Web Data Source Classification Based On Query Interface Context", Fourth International Conference on Computational and Information Sciences- 2012.
  10. Dayne Freitag, "Information extraction from HTML: Application ofageneral learning approach," Proceedings of the 15th ConferenceonArtificial Intelligence (AAAI1998), Madison, Wisconsin, USA, 1998.
  11. Mary Elaine Califf and Raymond J. Mooney, "Relational learning of pattern-match rules for information extraction," Proceedings of AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, Stanford, California, USA, 1998.
  12. Peng Wang, Yue You, Baowen Xu, and Jianyu Zhao, "Extracting Academic Information from Conference Web Pages," The 23rd IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Boca Raton, FL, 2011.
  13. Stephen Soderland, "Learning information extraction rules for semistructuredand free text," Journal of Machine Learning, vol. 34, pp. 233-272, 1999.
  14. Nicholas Kushmerick and Daniel S. Weld, "Wrapper induction for information extraction", Proceedings of the 15th International Conference on Artificial Intelligence (IJCAI1997), Nagoya, Aichi, Japan,1997.
  15. Ion Muslea, Steve Minton and Craig Knob lock, "A hierarchical approach to wrapper induction", Proceedings of the 3rd International
  16. Conference on Autonomous Agents, Seattle, Washington, USA, 1999
Index Terms

Computer Science
Information Sciences

Keywords

Design Implementation