Trinity for Web Data Extraction using Efficient Algorithm

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

Trinity for Web Data Extraction using Efficient Algorithm

Published on December 2015 by Sayali Khodade, Roshani Ade

National Conference on Advances in Computing

Foundation of Computer Science USA

NCAC2015 - Number 1

December 2015

Authors: Sayali Khodade, Roshani Ade

Sayali Khodade, Roshani Ade . Trinity for Web Data Extraction using Efficient Algorithm. National Conference on Advances in Computing. NCAC2015, 1 (December 2015), 18-22.

@article{

author = { Sayali Khodade, Roshani Ade },

title = { Trinity for Web Data Extraction using Efficient Algorithm },

journal = { National Conference on Advances in Computing },

issue_date = { December 2015 },

volume = { NCAC2015 },

number = { 1 },

month = { December },

year = { 2015 },

issn = 0975-8887,

pages = { 18-22 },

numpages = 5,

url = { /proceedings/ncac2015/number1/23356-5014/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 National Conference on Advances in Computing

%A Sayali Khodade

%A Roshani Ade

%T Trinity for Web Data Extraction using Efficient Algorithm

%J National Conference on Advances in Computing

%@ 0975-8887

%V NCAC2015

%N 1

%P 18-22

%D 2015

%I International Journal of Computer Applications

Abstract

Now a days there are increasing number of users on the internet. The internet is having a huge collection of web data which is very useful for the users. Web data extractors are used to crawl the data from web documents. The planned approach which operates on two or more web records at once, which is created at same server-side template and takes in a regular expression that models it and can later be used to retrieve information from same records. The template introduces some shared patterns that do not provide any relevant data and can thus be disregarded. The technique gives better results for multiword queries comparatively other existing techniques and input errors do not have any negative impact on its effectiveness.

References

Sleiman, H. A and Corchuelo, R. : Trinity: On Using Trinary Trees for UnsupervisedWeb Data Extraction In: Knowledge and Data Engineering, pp. 1544-1556. IEEE Transactions (2014).
Chia Hui Chang and Kayed, Mohammed and Girgis, M. R. and Shaalan, K. F. : A Survey of Web Information Extraction Systems In: Knowledge and Data Engineering, pp. 1411-1428. IEEE International Conference (2006)
Kayed, Mohammed and Chia Hui Chang and Shaalan, K. and Girgis, M. R. : FiVaTech: Page-Level Web Data Extraction from Template Pages In: Data MiningWorkshops, pp. 15-20. IEEE International Conference (2007)
Arvind Arasu and Garcia-Molina, H. : Extracting structured data from Web pages(Poster) In: Data Engineering, pp. 698-710. IEEE International Conference (2003)
V. Crescenzi, G. Mecca, and P. Merialdo, "Road runner: Towards automatic data extraction from large web sites," in Proc. 27th Int. Conf. VLDB, Rome, Italy, 2001, pp. 109–118.
C. -H. Chang, M. Kayed, M. R. Girgis, and K. F. Shaalan, "A survey of web information extraction systems," IEEE Trans. Knowl. DataEng. , vol. 18, no. 10, pp. 1411–1428, Oct. 2006.
H. A. Sleiman and R. Corchuelo, "A survey on region extractors from web documents," IEEE Trans. Knowl. Data Eng. , vol. 25, no. 9, pp. 1960–1981, Sept. 2012.
W. W. Cohen, M. Hurst, and L. S. Jensen, "A flexible learning system for wrapping tables and lists in HTML documents," in Proc. 11th Int. Conf. WWW, 2002, pp. 232–241.
V. Crescenzi and G. Mecca, "Automatic information extraction from large websites," J. ACM, vol. 51, no. 5, pp. 731–779, Sept. 2004.
M. Kayed and C. -H. Chang, "FiVaTech: Page-level web dataextraction from template pages," IEEE Trans. Knowl. Data Eng. ,vol. 22, no. 2, pp. 249–263, Feb. 2010.
A. Arasu and H. Garcia-Molina. "Extracting Structured Data from Web Pages," Proc. ACM SIGMOD, pp. 337-348, 2003.
Valiente, G. Tree edit distance and common subtrees. Research Report LSI-02-20-R, University Politecnica de Catalunya, Barcelona, Spain, 2002

Index Terms

Computer Science

Information Sciences

Keywords

Web Data Extraction Automatic Wrapper Generation Wrappers Unsupervised Learning