CFP last date
20 June 2024
Reseach Article

Feature-based Clustering of Web Data Sources

by Alsayed Algergawy
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 60 - Number 5
Year of Publication: 2012
Authors: Alsayed Algergawy

Alsayed Algergawy . Feature-based Clustering of Web Data Sources. International Journal of Computer Applications. 60, 5 ( December 2012), 1-4. DOI=10.5120/9685-4127

@article{ 10.5120/9685-4127,
author = { Alsayed Algergawy },
title = { Feature-based Clustering of Web Data Sources },
journal = { International Journal of Computer Applications },
issue_date = { December 2012 },
volume = { 60 },
number = { 5 },
month = { December },
year = { 2012 },
issn = { 0975-8887 },
pages = { 1-4 },
numpages = {9},
url = { },
doi = { 10.5120/9685-4127 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2024-02-06T21:05:47.933585+05:30
%A Alsayed Algergawy
%T Feature-based Clustering of Web Data Sources
%J International Journal of Computer Applications
%@ 0975-8887
%V 60
%N 5
%P 1-4
%D 2012
%I Foundation of Computer Science (FCS), NY, USA

The proliferation of web data sources increasingly demands the integration of these sources. To facilitate the integration process, a pre-analysis step is required to classify and group data sources into their correct domains. In this paper, we propose a feature-based clustering approach for clustering web data sources without any human intervention and based only on features extracted from the source schemas. In particular, we make use of both linguistic and structural schema features. We experimentally demonstrate the effectiveness of the proposed approach in terms of both the clustering quality and runtime.

  1. L. Barbosa and J. Freire. Combining classifiers to identify online databases. In WWW, 2007.
  2. L. Barbosa, J. Freire, and A. S. da Silva. Organizing hidden-web databases by clustering visible web documents. In ICDE, pages 326–335, 2007.
  3. L. Chiticariu, M. A. Hernndez, P. G. Kolaitis, and L. Popa. Semi-automatic schema integration in Clio. In VLDB'07, pages 1326–1329, 2007.
  4. H. H. Do and E. Rahm. Matching large schemas: Approaches and evaluation. Information Systems, 32(6):857– 885, 2007.
  5. T. M. Ghanem and W. G. Aref. Databases deepen the web. Computer, 37(1):116–117, 2004.
  6. J. Madhavan, S. R. Jeffery, S. Cohen, X. L. Dong, D. Ko, C. Yu, and A. Halevy. Web-scale data integration: You can only afford to pay as you go. In CIDR, pages 342–350, 2007.
  7. H. A. Mahmoud and A. Aboulnaga. Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems. In SIGMOD, 2010.
  8. S. Massmann and E. Rahm. Evaluating instance-based matching of web directories. In 11th Workshop on Web and Databases (WebDB), 2008.
  9. W. Meng and C. T. Yu. Advanced Metasearch Engine Technology. Morgan & Claypool Publishers, 2010.
  10. E. Peukert, S. Massmann, and K. Konig. Comparing similarity combination methods for schema matching. In GIWorkshop, pages 692–701, 2010.
  11. N. Yuruk, M. Mete, X. Xu, and T. A. J. Schweiger. AHSCAN: Agglomerative hierarchical structural clustering algorithm for networks. In ASONAM´ 09.
Index Terms

Computer Science
Information Sciences


Web data source Data integration Clustering Performance