CFP last date
20 May 2024
Reseach Article

Hybrid Web-page Segmentation and Block Extraction for Small Screen Terminals

Published on January 2014 by Shefali Singhal, Neha Garg
International IT Summit Confluence 2013-The Next Generation Information Technology Summit
Foundation of Computer Science USA
CONFLUENCE2013 - Number 2
January 2014
Authors: Shefali Singhal, Neha Garg
237308e3-c06c-4a10-bf8c-b01d176b66f6

Shefali Singhal, Neha Garg . Hybrid Web-page Segmentation and Block Extraction for Small Screen Terminals. International IT Summit Confluence 2013-The Next Generation Information Technology Summit. CONFLUENCE2013, 2 (January 2014), 12-15.

@article{
author = { Shefali Singhal, Neha Garg },
title = { Hybrid Web-page Segmentation and Block Extraction for Small Screen Terminals },
journal = { International IT Summit Confluence 2013-The Next Generation Information Technology Summit },
issue_date = { January 2014 },
volume = { CONFLUENCE2013 },
number = { 2 },
month = { January },
year = { 2014 },
issn = 0975-8887,
pages = { 12-15 },
numpages = 4,
url = { /proceedings/confluence2013/number2/15119-1311/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International IT Summit Confluence 2013-The Next Generation Information Technology Summit
%A Shefali Singhal
%A Neha Garg
%T Hybrid Web-page Segmentation and Block Extraction for Small Screen Terminals
%J International IT Summit Confluence 2013-The Next Generation Information Technology Summit
%@ 0975-8887
%V CONFLUENCE2013
%N 2
%P 12-15
%D 2014
%I International Journal of Computer Applications
Abstract

Web page representation is a topic of concern for small screen devices, like, mobile, palm, etc. In a web-page, bulk of irrelevant data including advertisements and other noisy information's create access inconvenience. Web page segmentation is a technique which resolves this problem by logically dividing a web page into segments. These segments can be created by using DOM (Document Object Model) and VIPS (Visual Page Segmentation) techniques. In this paper, a hybrid method of web page segmentation has been designed using combination of DOM method and VIPS algorithm for developing segments from a web page. Here both the structural and visual aspects of a web page to create a segment have been considered. A segment is such a basic unit of web page which cannot be further divided. This is done by processing a web page through a BLOCK CREATION ALGORITHM which is discussed further.

References
  1. Vidur Apparao, Steve Byrne, Mike Champion, Scott Isaacs, Ian Jacobs, Arnaud Le Hors, Gavin Nicol, Jonathan Robie, Robert Sutor, Chris Wilson, Lauren Wood, Document Object Model (DOM) Technical Reports, In Proceedings of W3C Recommendation, 1 October 1998.
  2. Deng Cai, Shipeng Yu, Ji-Rong Wen and Wei-Ying Ma, VIPS: a Vision-based Page Segmentation Algorithm, In Proceedings of Microsoft Research, Microsoft CorporationOne Microsoft Way Redmond, WA 98052 Nov. 1, 2003.
  3. Embley, D. W. , Jiang, Y. , and Ng, Y. -K. , Record-boundary discovery in Web documents, In Proceedings of the 1999 ACM SIGMOD international conference on Management of data, Philadelphia PA, 1999, pp. 467-478.
  4. Kaasinen, E. , Aaltonen, M. , Kolari, J. , Melakoski, S. , and Laakko, T. , Two Approaches to Bringing Internet Services to WAP Devices, In Proceedings of 9th International World-Wide Web Conference, 2000, pp. 231-246.
  5. Lin, S. -H. and Ho, J. -M. , Discovering Informative Content Blocks from Web Documents, In Proceedings of ACM SIGKDD'02, 2002.
  6. Fangju Wang, Jing Li, Hooman Homayounfar, A space efficient XML DOM parser, Department of Computing and Information Science, University of Guelph, Guelph, Ont. , Canada, Jan. 2007.
  7. S. M. Metev and V. P. Veiko, Laser Assisted Microtechnology, 2nd ed. , R. M. Osgood, Jr. , Ed. Berlin, Germany: Springer-Verlag, 1998.
  8. J. Breckling, Ed. , the Analysis of Directional Time Series: Applications to Wind Speed and Direction, ser. Lecture Notes in Statistics. Berlin, Germany: Springer, 1989, vol. 61.
  9. S. Zhang, C. Zhu, J. K. O. Sin, and P. K. T. Mok, "A novel ultrathin elevated channel low-temperature poly-Si TFT," IEEE Electron Device Lett. , vol. 20, pp. 569–571, Nov. 1999.
  10. M. Wegmuller, J. P. von der Weid, P. Oberson, and N. Gisin, "High resolution fiber distributed measurements with coherent OFDR," in Proc. ECOC'00, 2000, paper 11. 3. 4, p. 109.
  11. Bar-Yossef, Z. and Rajagopalan, S. , Template Detection via Data Mining and its Applications, In Proceedings of the 11th International World Wide Web Conference (WWW2002), 2002.
  12. Rahman, A. , Alam, H. , and Hartono, R. , Content Extraction from HTML Documents, In Proceedings of the First International Workshop on Web Document Analysis (WDA2001), 2001.
Index Terms

Computer Science
Information Sciences

Keywords

Web Page Segmentation Block Extraction Algorithms And System Architecture.