CFP last date
20 May 2024
Reseach Article

A New Homogeneity Inter-Clusters Measure in Semi-Supervised Clustering

by Badreddine Meftahi, Ourida Ben Boubaker Saidi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 66 - Number 24
Year of Publication: 2013
Authors: Badreddine Meftahi, Ourida Ben Boubaker Saidi
10.5120/11267-6526

Badreddine Meftahi, Ourida Ben Boubaker Saidi . A New Homogeneity Inter-Clusters Measure in Semi-Supervised Clustering. International Journal of Computer Applications. 66, 24 ( March 2013), 37-45. DOI=10.5120/11267-6526

@article{ 10.5120/11267-6526,
author = { Badreddine Meftahi, Ourida Ben Boubaker Saidi },
title = { A New Homogeneity Inter-Clusters Measure in Semi-Supervised Clustering },
journal = { International Journal of Computer Applications },
issue_date = { March 2013 },
volume = { 66 },
number = { 24 },
month = { March },
year = { 2013 },
issn = { 0975-8887 },
pages = { 37-45 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume66/number24/11267-6526/ },
doi = { 10.5120/11267-6526 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:23:19.983196+05:30
%A Badreddine Meftahi
%A Ourida Ben Boubaker Saidi
%T A New Homogeneity Inter-Clusters Measure in Semi-Supervised Clustering
%J International Journal of Computer Applications
%@ 0975-8887
%V 66
%N 24
%P 37-45
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Many studies in data mining have proposed a new learning called semi-Supervised. Such type of learning combines unlabeled and labeled data which are hard to obtain. However, in unsupervised methods, the only unlabeled data are used. The problem of significance and the effectiveness of semi-supervised clustering results is becoming of main importance. This paper pursues the thesis that muchgreater accuracy can be achieved in such clustering by improving the similarity computing. Hence, we introduce a new approach of semi-supervised clustering using an innovative new homogeneity measure of generated clusters. Our experimental results demonstrate significantly improved accuracy as a result.

References
  1. Jain, A. K. and Dubes, R. C. : Algorithms for clustering data,Prentice-Hall, Inc. , UpperSaddle River, NJ, USA. (1988).
  2. Zhang,Y. , Mao,J. , and Xiong,Z. 2003: An Efficient Clustering Algorithm. International Conference on Machine Learning and Cybernetics, vol. 1,pp. 261-265.
  3. Hartigan, J. A. 1975. Clustering Algorithms. Wiley, New York.
  4. Anderberg,M. R. 1973. Cluster Analysis for Application. Academic Press, New York.
  5. KiriWagstaff, Claire Cardie, Seth Rogers, Stefan Schrödl,"Constrained K-means Clustering with Background Knowledge" ICML '01 Proceedingsof the Eighteenth International Conference on Machine Learning, 2001
  6. SugatoBasu, ArindamBanerjee, and Raymond J. Mooney. Semisupervisedclustering by seeding. In Proceedings of 19th InternationalConference on Machine Learning, 27-34.
  7. Klein, D. , S. D. Kamvar, and C. D. manning: From instance-levelconstraints to space-levelconstraints: making the most of priorknowledge in dataclustering. In the international conference on machine learning, (ICML '02), Springer-Verlag. San francisco, USA (2002) 307-314.
  8. Han,J. , and Kamber,M. 2000. Data Mining: Concepts and Techniques. Morgan Kaufmann.
  9. Bilenko, M. , Basu, S. , and Mooney, R. J. (2004). Integrating constraints,and metric learning in semi-supervised clustering. In ICML '04: Proceedingsof the twenty-first international conference on Machine learning,pages 81–88, New York, NY, USA. ACM.
  10. (Kestler et al, 2006)Kestler, H. A. , Kraus, J. M. , Palm, G. and Schwenker, F. : On the e?ects of constraints in semi-supervised hierarchical clustering. Arti?cial Neural Networks in Pattern Recognition, Springer-Verlag. Germany (2006) 57-66
  11. Bade, K. , Hermkes, M. and Nrnberger. A: User orientedhierarchicalinformationorganization and retrieval. Proceedings of the Europeanconference onMachine Learning, (ECML'07), Springer-Verlag. Poland (2007) 518-526.
  12. Daniels, K. and Giraud-Carrier, C. (2006). Learning the threshold in hierarchical agglomerative clustering. In ICMLA '06: Proceedings of the 5th International Conference on Machine Learning and Applications, pages 270–278, Washington, DC, USA. IEEE Computer Society.
  13. Bohm, C. and Plant. C. : Hissclu : A hierarchicaldensity-basedmethod for semi-supervised clustering. Proceedings of the international conference onextendingdatabasetechnology, (EDBT '08). New York, USA (2008) 440-451.
  14. Davidson I. , Ravi, S. S. , Intractability and Clustering withConstraints. To Appear in the Proceeding of ICML 2007.
  15. MP. Murphy and D. W. Aha, Ucirepositorydatabases. http://www. ics. uci. edu/mlearn, 1996.
  16. Quinlan, J. R. : Induction of decisiontrees. Machine Learning (1986) 81-106.
Index Terms

Computer Science
Information Sciences

Keywords

Semi-supervised clustering distance computation homogeneity measure