Retrieve Main Content using Vision-base Web Page Segmentation with Gomory-Hu Tree

Khaing Wah Wah Linn

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2026

Submit your paper

Know more

The week's pick

Quantifying Label-Induced Bias in Large Language Model Self and Cross Evaluations

Muskan Saraf Sajjad Rezvani Boroujeni Justin Beaudry Hossein Abedi Tom Bush

Random Articles

Design and Performance Analysis of OFDMA System using Suboptimal Heuristic Algorithm

December

2015

Artificial Neural Network based Intrusion Detection System: A Survey

February

2012

C-Shape Microstrip Patch for Dual Band on Different Shape Ground Plane

July

2015

Analysis of Mobile IP Protocols Security

May

2012

Reseach Article

Retrieve Main Content using Vision-base Web Page Segmentation with Gomory-Hu Tree

by Khaing Wah Wah Linn

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 108 - Number 17

Year of Publication: 2014

Authors: Khaing Wah Wah Linn

10.5120/19006-0547

Khaing Wah Wah Linn . Retrieve Main Content using Vision-base Web Page Segmentation with Gomory-Hu Tree. International Journal of Computer Applications. 108, 17 ( December 2014), 34-37. DOI=10.5120/19006-0547

@article{ 10.5120/19006-0547,

author = { Khaing Wah Wah Linn },

title = { Retrieve Main Content using Vision-base Web Page Segmentation with Gomory-Hu Tree },

journal = { International Journal of Computer Applications },

issue_date = { December 2014 },

volume = { 108 },

number = { 17 },

month = { December },

year = { 2014 },

issn = { 0975-8887 },

pages = { 34-37 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume108/number17/19006-0547/ },

doi = { 10.5120/19006-0547 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:43:37.223874+05:30

%A Khaing Wah Wah Linn

%T Retrieve Main Content using Vision-base Web Page Segmentation with Gomory-Hu Tree

%J International Journal of Computer Applications

%@ 0975-8887

%V 108

%N 17

%P 34-37

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The world wide (www) serves a huge, widely distributed global information services. A huge amount of data have been accumulated and stored on the web. The information on Web is usually presented via Hypertext Markup Language (HTML) to make its perception easier for humans. Web pages usually contain various contents, which are relevant or irrelevant to the main topic. Irrelevant contents are called noise. A web page usually contains the number of noise which is not related to the main information of the page such as navigation bar, advertisements, and related articles and so on. Noise on the web pages tends to problem mining the main content of these pages. This paper is proposed wed page segmentation using Gomory-Hu tree based Vision-based Page Segmentation (VIPS) algorithm.

References

Cai, D. ,Yu, S. , Wen, J. R. , Ma, W. Y. , "VIPS: A vision-based segmentation algorithm". 2003.
Elgin Akpinar and Yeliz Yesilada, "Vision Based Page Segmentation: Extended and Improved Alorithm", Middle East Technical University, Ankara, Turkey.
Deng C. , Shipeng Y. , Ji-Rong W. , Wei-Ying M. , "Extraction Content Structure for Web Pages based on Visual Representation", Microsoft Research Asia, China.
Brown, L. D. , Hua, H. , and Gao, C. 2003. A widget framework for augmented interaction in SCAPE.
Amit Chauhan, Himanshu Uniyal, Dr. Bhasker Pant, "Cleaining Web Pages for Relevant Text Extraction and Text Categorization", Graphic Era University, India.
Deng C. , Shipeng Y. , Ji-Rong W. , Wei-Ying M. , "Block-based Web Search", Microsoft Research Asia, China.
Swe Swe Nyein, "Mining Contents in Web Page Using Cosine Similarity", University of Computer Studies, Yangon, Myanmar.
Xinyue Liu, 2011 "Segmenting Webpage with Gomory-Hu Tree Based Clustering", Dalian University of Technology, Dalian, China.
Han Fengjiao, Zhou Zhurong, 2012, "Semantics-based Extraction of Webpage Main Text", Chongqing.
Aihua Zhang, Jiwu Jing, Le Kang, Lingchen Zhang, "Precise web page segmentation based on semantic block headers detection", University of science and technology, China.
Chaw Su Win, "Informative Content Extraction By using Eifce", IJSTR, 2013

Index Terms

Computer Science

Information Sciences

Keywords

Web Page Segmentation Vision-based Page Segmentation Gomory-Hu tree