Feature-based Clustering of Web Data Sources

Alsayed Algergawy

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 20 July 2026

Submit your paper

Know more

The week's pick

Quantifying Label-Induced Bias in Large Language Model Self and Cross Evaluations

Muskan Saraf Sajjad Rezvani Boroujeni Justin Beaudry Hossein Abedi Tom Bush

Random Articles

Success Factors of Adapting Agile Methods in Global and Local Software Development: A Systematic Literature Review Protocol with Preliminary Results

Aug

2017

Visual Aided GPS Navigation for Autonomous Mobile Robots

February

2010

System and Process of Electric Energy Cogeration for Data Centers Environment Servers

Jul

2018

A Data Analysis of Steam’s Game Catalog and Diverse Recommendation Strategies

Dec

2024

Reseach Article

Feature-based Clustering of Web Data Sources

by Alsayed Algergawy

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 60 - Number 5

Year of Publication: 2012

Authors: Alsayed Algergawy

10.5120/9685-4127

Alsayed Algergawy . Feature-based Clustering of Web Data Sources. International Journal of Computer Applications. 60, 5 ( December 2012), 1-4. DOI=10.5120/9685-4127

@article{ 10.5120/9685-4127,

author = { Alsayed Algergawy },

title = { Feature-based Clustering of Web Data Sources },

journal = { International Journal of Computer Applications },

issue_date = { December 2012 },

volume = { 60 },

number = { 5 },

month = { December },

year = { 2012 },

issn = { 0975-8887 },

pages = { 1-4 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume60/number5/9685-4127/ },

doi = { 10.5120/9685-4127 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:05:47.933585+05:30

%A Alsayed Algergawy

%T Feature-based Clustering of Web Data Sources

%J International Journal of Computer Applications

%@ 0975-8887

%V 60

%N 5

%P 1-4

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The proliferation of web data sources increasingly demands the integration of these sources. To facilitate the integration process, a pre-analysis step is required to classify and group data sources into their correct domains. In this paper, we propose a feature-based clustering approach for clustering web data sources without any human intervention and based only on features extracted from the source schemas. In particular, we make use of both linguistic and structural schema features. We experimentally demonstrate the effectiveness of the proposed approach in terms of both the clustering quality and runtime.

References

L. Barbosa and J. Freire. Combining classifiers to identify online databases. In WWW, 2007.
L. Barbosa, J. Freire, and A. S. da Silva. Organizing hidden-web databases by clustering visible web documents. In ICDE, pages 326–335, 2007.
L. Chiticariu, M. A. Hernndez, P. G. Kolaitis, and L. Popa. Semi-automatic schema integration in Clio. In VLDB'07, pages 1326–1329, 2007.
H. H. Do and E. Rahm. Matching large schemas: Approaches and evaluation. Information Systems, 32(6):857– 885, 2007.
T. M. Ghanem and W. G. Aref. Databases deepen the web. Computer, 37(1):116–117, 2004.
J. Madhavan, S. R. Jeffery, S. Cohen, X. L. Dong, D. Ko, C. Yu, and A. Halevy. Web-scale data integration: You can only afford to pay as you go. In CIDR, pages 342–350, 2007.
H. A. Mahmoud and A. Aboulnaga. Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems. In SIGMOD, 2010.
S. Massmann and E. Rahm. Evaluating instance-based matching of web directories. In 11th Workshop on Web and Databases (WebDB), 2008.
W. Meng and C. T. Yu. Advanced Metasearch Engine Technology. Morgan & Claypool Publishers, 2010.
E. Peukert, S. Massmann, and K. Konig. Comparing similarity combination methods for schema matching. In GIWorkshop, pages 692–701, 2010.
N. Yuruk, M. Mete, X. Xu, and T. A. J. Schweiger. AHSCAN: Agglomerative hierarchical structural clustering algorithm for networks. In ASONAM´ 09.

Index Terms

Computer Science

Information Sciences

Keywords

Web data source Data integration Clustering Performance