XML: URL Data Set Creation for Future Web Mining Research Avenues

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2026

Submit your paper

Know more

The week's pick

AI-Assisted Observability in Distributed Microservice Architectures

Kyrylo Sotnykov

Random Articles

Using System Dynamics Approach in Modeling the Integrated Farming Scenario to Increase Cassava Production in Indonesia

May

2016

An Algorithm for Face Recognition based on Isolated Image Points with Neural Network

Sep

2016

Design of Linear Phase Low Pass FIR Filter using Particle Swarm Optimization Algorithm

July

2014

Quality Factor Study for Cone-metal Shelled Structure

October

2015

Reseach Article

XML: URL Data Set Creation for Future Web Mining Research Avenues

Published on March 2012 by Krishna Murthy. A, Suresha

International Conference in Computational Intelligence

Foundation of Computer Science USA

ICCIA - Number 3

March 2012

Authors: Krishna Murthy. A, Suresha

Krishna Murthy. A, Suresha . XML: URL Data Set Creation for Future Web Mining Research Avenues. International Conference in Computational Intelligence. ICCIA, 3 (March 2012), 1-4.

@article{

author = { Krishna Murthy. A, Suresha },

title = { XML: URL Data Set Creation for Future Web Mining Research Avenues },

journal = { International Conference in Computational Intelligence },

issue_date = { March 2012 },

volume = { ICCIA },

number = { 3 },

month = { March },

year = { 2012 },

issn = 0975-8887,

pages = { 1-4 },

numpages = 4,

url = { /proceedings/iccia/number3/5111-1024/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 International Conference in Computational Intelligence

%A Krishna Murthy. A

%A Suresha

%T XML: URL Data Set Creation for Future Web Mining Research Avenues

%J International Conference in Computational Intelligence

%@ 0975-8887

%V ICCIA

%N 3

%P 1-4

%D 2012

%I International Journal of Computer Applications

Abstract

The rapid expansion of internet has made web a popular place for disseminating and collecting information and also it opens up many research topics on varies research fields. Since last few years, several attempts have been made on Web based research particularly based on HTML web pages because of their huge availability. So that many Research Data Sets have been created and most of them are made available on web. But W3 consortium stated that, HTML does not provide a better description of semantic structure of the web page contents. To overcome this draw backs Web developers started to develop Web page(s) on XML, Flash kind of new technologies [1]. It makes a way for new research methods. This article mainly focuses on Data Set creation on XML Web pages by using Sequential Search, Link Extraction and String based Classification methods for future research avenues on XML Web pages.

References

Book: Ed Tittel, âComplete Coverage of XMLâ, Tata McGraw-Hill Edition.
Book: Magdalini Eirinaki, âWEB MINING: A ROADMAPâ
Lan Yi, Bing Liu, and Xiaoli Li. , 2003, âEliminating noisy information in web pages for data miningâ. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 296{305, New York, NY, USA. ACM.
http://www.w3c.org/DOM/
P.F Xiang et al. 2006 âEffective Page Segmentation Combining Pattern Analysis and Visual Separators for Browsing on Small Screensâ Web Intelligenc.
Shumeet Baluja 2006, âBrowsing on small screens: recasting web-page segmentation into an efficient machine learning frameworkâ. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 33{42, New York, NY, USA. ACM.
Y. Chen, X. Xie, W.-Y. Ma, and H.-J. Zhang, 2005. âAdapting web pages for small-screen devicesâ Internet Computing, 9(1):50â56.
Xin Yang, Yuanchun Shi, 2009 âEnhanced Gestalt Theory Guided Web Page Segmentation for Moile Browsingâ IEEE/WIC/ACM.
Jaideep Srivastava_ y , Robert Cooley, et al, 2000, âWeb Usage Mining: Discovery and Applications of Usage Patterns from Web Dataâ Volume 1, Issue 2 - page 12, ACM SIGKDD.
Abraham. A, âBusiness Intelligence from Web Usage Miningâ, Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co., Singapore, Vol. 2, No. 4, pp. 375-390, 003.
Soumen Chakrabarti, 2000, âData mining for hypertext: A tutorial surveyâ Volume 1, Issue 2 - page 1 ACM SIGKDD.
Soumen Chakrabarti, Byron E. Dom et al, âMining the Link Structure of the World Wide Webâ 1999. _IBM Almaden Research Center, 650 Harry Road, San Jose CA 95120.
C. Kohlsch utter and W. Nejdl. 2008, âA Densitometric Approach to Web Page Segmentationâ. In ACM 17th Conf. on Information and Knowledge Management (CIKM 2008), 2008.
Christian Kohlschutter, Peter Fankhauser, Wolfgang Nejdl, 2010 âBoilerplate Detection using Shallow Text Featuresâ, WSDM, New York, USA, ACM.
G. Poonkuzhali, K.Thiagarajan, and K.Sarukesi, 2009 âSigned Approach for Mining Web content Outliersâ, World Academy of Science, Engineering and Technology 56.
Bar-Yossef, Z. and Rajagopalan, S., 2002 âTemplate Detection via Data Mining and its Applicationsâ. In Proceedings of the 11th International World Wide Web Conference (WWW2002).
Lin, S.-H. and Ho, J.-M., 2002, âDiscovering Informative Content Blocks from Web Documentsâ. In Proceedings of ACM SIGKDD'02.

Index Terms

Computer Science

Information Sciences

Keywords

URL data set XML URLâs URL Extraction URL Classification