CFP last date
20 May 2024
Reseach Article

Performance Comparison of Web Data Extraction Techniques

by Neeraj Raheja, Vijay Kumar Katiyar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 135 - Number 11
Year of Publication: 2016
Authors: Neeraj Raheja, Vijay Kumar Katiyar
10.5120/ijca2016908537

Neeraj Raheja, Vijay Kumar Katiyar . Performance Comparison of Web Data Extraction Techniques. International Journal of Computer Applications. 135, 11 ( February 2016), 6-13. DOI=10.5120/ijca2016908537

@article{ 10.5120/ijca2016908537,
author = { Neeraj Raheja, Vijay Kumar Katiyar },
title = { Performance Comparison of Web Data Extraction Techniques },
journal = { International Journal of Computer Applications },
issue_date = { February 2016 },
volume = { 135 },
number = { 11 },
month = { February },
year = { 2016 },
issn = { 0975-8887 },
pages = { 6-13 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume135/number11/24091-2016908537/ },
doi = { 10.5120/ijca2016908537 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:35:30.084706+05:30
%A Neeraj Raheja
%A Vijay Kumar Katiyar
%T Performance Comparison of Web Data Extraction Techniques
%J International Journal of Computer Applications
%@ 0975-8887
%V 135
%N 11
%P 6-13
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Websites in today world consist of a large amount of data as per the requirements of the users. So web data extraction systems helps user in extracting the required data from these types of websites. The basic techniques used for web data extraction are manual and web wrapper. Web wrapper further consists of wrapper induction and automatic approaches. A lot of methods are available which uses wrapper induction and automatic methods. This research work provides performance comparison of manual, web wrapper induction and automatic approaches on the basis of methods chosen as manual (By manual efforts), nX1 (web wrapper induction), DEPTA and MDR (Automatic). The results are compared on the basis of various parameters like precision, recall, F-measure and data extraction time.

References
  1. C. Chang, S. Lui.,” IEPAD: Information extraction based on pattern discovery”, in WWW, pp. 681-688, 2001.
  2. Neeraj Raheja, V.K.Katiyar, " A Noise Reduction Approach based on n x 1 table and XSL display method for efficient web data extraction” , IJCA International Journal of Computer Applications (0975 – 8887) Vol. 64, No.11, pp. 12-17, February 2013.
  3. Y. Zhai, B. Liu. , “Web data extraction based on partial tree alignment”, in WWW, pp. 76-85, 2005.
  4. B. Liu, R. L. Grossman, Yanhong Zhai, “Mining data records in Web pages”, in KDD, pp. 601-606, 2003.
  5. Bing Liu and Yanhong Zhai, "NET - A System for Extracting Web Data from Flat and Nested Data Records",  proceedings of 6th International Conference on Web Information Systems Engineering(WISE-05), 2005.
  6. Valter Crescenzi et.al. “ROADRUNNER: Towards Automatic Data Extraction from Large Web Sites”, 2001.
  7. Zhai Y and Liu B, “Extracting Web data using instance-based learning” in WISE-05, 2005.
  8. Arasu A and Garcia-Molina H., “Extracting Structured Data from Web Pages”, in SIGMOD-03, 2003.
  9. Lerman K., Getoor L., Minton, S. and Knoblock C, “Using the Structure of Web Sites for Automatic segmentation of Tables”, SIGMOD-04, 2004.
  10. J. Hammer, H. Garcia Molina, J. Cho, and A. Crespo. , “Extracting semi structured information from the web”, in proceedings of the Workshop on the Management of Semi-structured Data, 1997.
  11. Chang C-H., Lui, S-L., “IEPAD: Information Extraction Based on Pattern Discovery”, WWW-01, 2001.
  12. Kushmerick N., “Wrapper Induction: Efficiency and Expressiveness.Artificial Intelligence”, 2000.
  13. P. S. Hiremath, Siddu P. Algur, “Extraction of Data from Web Pages: A Vision Based Approach”, International Journal of Computer, Electrical, Automation, Control and Information Engineering, Vol 3, No 3, pp. 623-632, 2009.
  14. Faustina Johnson, Santosh Kumar, “Web Content Mining Using Genetic Algorithm”, in Advances in Computing, Communication, and Control Communications in Computer and Information Science (Springer), Vol. 361, pp. 82-93, 2013.
Index Terms

Computer Science
Information Sciences

Keywords

Web data extraction manual web wrapper nX1 Depta MDR.