CFP last date
20 May 2024
Reseach Article

Scalable Parallel Clustering Approach for Large Data using Possibilistic Fuzzy C-Means Algorithm

by Juby Mathew, R Vijayakumar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 103 - Number 9
Year of Publication: 2014
Authors: Juby Mathew, R Vijayakumar
10.5120/18103-9195

Juby Mathew, R Vijayakumar . Scalable Parallel Clustering Approach for Large Data using Possibilistic Fuzzy C-Means Algorithm. International Journal of Computer Applications. 103, 9 ( October 2014), 24-29. DOI=10.5120/18103-9195

@article{ 10.5120/18103-9195,
author = { Juby Mathew, R Vijayakumar },
title = { Scalable Parallel Clustering Approach for Large Data using Possibilistic Fuzzy C-Means Algorithm },
journal = { International Journal of Computer Applications },
issue_date = { October 2014 },
volume = { 103 },
number = { 9 },
month = { October },
year = { 2014 },
issn = { 0975-8887 },
pages = { 24-29 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume103/number9/18103-9195/ },
doi = { 10.5120/18103-9195 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:34:58.209218+05:30
%A Juby Mathew
%A R Vijayakumar
%T Scalable Parallel Clustering Approach for Large Data using Possibilistic Fuzzy C-Means Algorithm
%J International Journal of Computer Applications
%@ 0975-8887
%V 103
%N 9
%P 24-29
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Clustering is an unsupervised learning task where one seeks to identify a finite set of categories termed clusters to describe the data. The proposed system, try to exploit computational power from the multicore processors by modifying the design on existing algorithms and software. However, the existing clustering algorithms either handle different data types with inefficiency in handling large data or handle large data with limitations in considering numeric attributes. Hence, parallel clustering has come into picture to provide crucial contribution towards clustering large data. In this paper a scalable parallel clustering algorithm called Possibilistic Fuzzy C-Means (PFCM) clustering to cluster large data is introduced. In order to harvest the full power of a multi-core processor the software application must be able to execute tasks in parallel utilizing all available CPUs. To achieve this aim, it use fork/join method in java programming. It is the most effective design techniques for obtaining good parallel performance. The experimental analysis will be carried out to evaluate the feasibility of the scalable Possibilistic Fuzzy C-Means (PFCM) clustering approach. The experimental analysis showed that the proposed approach obtained upper head over existing method in terms of accuracy, classification error percentage and time.

References
  1. JinchaoJi , Wei Pang, Chunguang Zhou, Xiao Han, Zhe Wang, "A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data", journal of Knowledge-Based Systems, vol. 30, pp. 129-135, 2012
  2. Swagatam Das, Ajith Abraham, Amit Konar, "Automatic Clustering Using an Improved Differential Evolution Algorithm", IEEE Transactions on Systems, Man, and Cybernetics—Systems And Humans, Vol. 38, No. 1, 2008
  3. HesamIzakian, Ajith Abraham, Vaclav Snasel, "Fuzzy Clustering Using Hybrid Fuzzy c-means and Fuzzy Particle Swarm Optimization", World Congress on Nature and Biologically Inspired Computing (NaBIC 2009), India, IEEE Press, pp. 1690-1694, 2009.
  4. Doug Lea, A Java Fork/Join Framework, State University of New York at Oswego,www. developer. com
  5. D. Mostofa Ali Patwary,Diana Palsetia1, Ankit Agrawal1,Wei-keng Liao1, Fredrik Manne2, AlokChoudhary, " Scalable Parallel OPTICS Data Clustering Using Graph Algorithmic Techniques", International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, No. 49, 2013
  6. Li X. and Fang Z. , "Parallel clustering algorithms", Parallel Computing, 1989, 11(3): pp. 275-290.
  7. Dhillon and Modha D. , "A Data-Clustering Algorithm on Distributed Memory Multiprocessors", Proceedings of ACM Workshop on Large Scale Parallel KDD Systems, 1999, pp. 47-56.
  8. Kantabutra S. and Couch A. , "Parallel k-means clustering algorithm on NOWs", Technical Journal NECTEC, 2000, Vol. 1,No. 6
  9. Tian J. , Zhu L. , Zhang S. , and Liu L. , "Improvement and parallelism of k-means clustering algorithm", Tsignhua Science and Technology, 2005
  10. R. Krishnapuram and J. M. Keller, "A possibilistic approach to clustering," IEEE Transactions on Fuzzy Systems, vol. 1, p. 10. 1109/91. 227387, 1993.
  11. Prasad, "Parallelization of k-means clustering algorithm", Project Report, University of Colorado, 2007.
  12. Farivar R. , Rebolledo D. , Chan E, "A Parallel Implementation of k-means Clustering on GPUs", Proceedings of International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), 2008, pp. 340-345.
  13. Inderjit S. Dhillon and Dharmendra S. Modha, "A Data-Clustering Algorithm On Distributed Memory Multiprocessors", Proceedings of KDD Workshop High Performance Knowledge Discovery, pp. 245-260, 1999.
  14. Jiabin Deng, JuanLi Hu, Hehua Chi and Juebo Wu, "An Improved Fuzzy Clustering Method for Text Mining", Second International Conference on Networks Security Wireless Communications and Trusted Computing (NSWCTC), Vol. 1, Pp. 65–69, 2010.
  15. Pal N. R, Pal K, Keller J. M. and Bezdek J. C, "A Possibilistic Fuzzy c-Means Clustering Algorithm", IEEE Transactions on Fuzzy Systems, Vol. 13, No. 4, Pp. 517–530,
  16. Robert D Blumofe, The University of Texas at Austin, Scheduling Multithreaded Computations by Work stealing.
  17. J Senthilnath, S. N. Omkar, Swarm and Evolutionary Computation 1(2011)164-171.
Index Terms

Computer Science
Information Sciences

Keywords

Clustering parallel k-means Fuzzy C-Means Possibilistic Fuzzy C-Means Fork/Join