CFP last date
20 May 2024
Reseach Article

Scalability of Parallel Genetic Algorithm for Two-mode Clustering

by Briti Deb, Satish Narayana Srirama
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 94 - Number 14
Year of Publication: 2014
Authors: Briti Deb, Satish Narayana Srirama
10.5120/16411-5829

Briti Deb, Satish Narayana Srirama . Scalability of Parallel Genetic Algorithm for Two-mode Clustering. International Journal of Computer Applications. 94, 14 ( May 2014), 23-26. DOI=10.5120/16411-5829

@article{ 10.5120/16411-5829,
author = { Briti Deb, Satish Narayana Srirama },
title = { Scalability of Parallel Genetic Algorithm for Two-mode Clustering },
journal = { International Journal of Computer Applications },
issue_date = { May 2014 },
volume = { 94 },
number = { 14 },
month = { May },
year = { 2014 },
issn = { 0975-8887 },
pages = { 23-26 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume94/number14/16411-5829/ },
doi = { 10.5120/16411-5829 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:17:38.910448+05:30
%A Briti Deb
%A Satish Narayana Srirama
%T Scalability of Parallel Genetic Algorithm for Two-mode Clustering
%J International Journal of Computer Applications
%@ 0975-8887
%V 94
%N 14
%P 23-26
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Data matrix having the same set of entity in the rows and cloumns is known as one-mode data matrix, and traditional one-mode clustering algorithms can be used to cluster the rows (or columns) separately. With the popularity of use of two-mode data matrices where the rows and columns have different sets of entities, the need for simultaneous clustering of rows and columns popularly known as two-mode clustering increased. Additionally, the emergence of large data sets and the prediction of Moore's law slow-down have created the challenge of clustering scalability. In this paper, we address the problem of scalability of organizing an unlabelled two-mode dataset into clusters utilizing multicore processor. We propose a parallel genetic algorithm (GA) heuristics based two-mode clustering algorithm, which is an adaptation of the classical Cuthill-McKee Matrix Bandwidth Minimization (MBM) algorithm. The classical MBM method aims at reducing the bandwidth of a sparse symmetric matrix, which we adapted to make it suitable for non-symmetric real-valued matrix. Preliminary results indicate that our algorithm is scalable on multicore processor compared to serial implementation. Future work will include more extensive experiments and evaluations of the system.

References
  1. Suderman, M. , & Hallett, M. 2007. Tools for visually exploring biological networks. Bioinformatics, 23(20), 2651-2659.
  2. Chu, Cheng-Tao, et al. Map-reduce for machine learning on multicore. NIPS. Vol. 6. (2006).
  3. Borgatti, Stephen P. , and Martin G. Everett. Network analysis of 2-mode data. Social networks 19. 3 (1997): 243-269.
  4. Van Mechelen, Iven, Hans-Hermann Bock, and Paul De Boeck. Two-mode clustering methods: a structured overview. Statistical methods in medical research 13. 5 (2004): 363-394.
  5. E. Cuthill and J. McKee. Reducing the bandwidth of sparse symmetric matrices In Proc. 24th Nat. Conf. ACM, pages 157–172, (1969).
  6. Hageman, J. A. , van den Berg, R. A. , Westerhuis, J. A. , van der Werf, M. J. , & Smilde, A. K. (2008). Genetic algorithm based two-mode clustering of metabolomics data. Metabolomics, 4(2), 141-149.
  7. Estivill-Castro, Vladimir (June 2002). Why so many clustering algorithms — A Position Paper. ACM SIGKDD Explorations Newsletter 4 (1): 65–75.
  8. Cheng, Y. , & Church, G. M. (2000, August). Biclustering of expression data. In ISMB, Vol. 8, pp. 93-103.
  9. Dhillon, I. S. 2001, August. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 269-274. ACM.
  10. Liiv, I. 2010. Seriation and matrix reordering methods: An historical overview. Statistical analysis and data mining, 3(2), 70-91.
  11. Schaeffer, S. E. 2007. Graph clustering. Computer Science Review, 1(1), 27-64.
  12. Esposito, A. , Catalano, M. S. , Malucelli, F. , & Tarricone, L. 1998. A new matrix bandwidth reduction algorithm. Operations Research Letters, 23(3), 99-107.
  13. Hageman, J. A. , Malosetti, M. , & Van Eeuwijk, F. A. 2012. Two-mode clustering of genotype by trait and genotype by environment data. Euphytica, 183(3), 349-359.
  14. Zhang, P. , Wang, J. , Li, X. , Li, M. , Di, Z. , & Fan, Y. 2008. Clustering coefficient and community structure of bipartite networks. Physica A: Statistical Mechanics and its Applications, 387(27), 6869-6875.
  15. Arabie, P. , Schleutermann, S. , Daws, J. , and Hubert, L. 1988, Marketing Applications of Sequencing and Partitioning of Nonsymmetric and/or Two-Mode Matrices, in Data, Expert Knowledge and Decisions, Eds. W. Gaul, and M. Schader, Berlin: Springer-Verlag, 215-224.
  16. Fonseca, Y. C. F. A bipartite graph co-clustering approach to ontology mapping, Proceedings of the Workshop on Semantic Web Technologies for Searching and Retrieving Scientific Data. Colocated with the Second International Semantic Web Conference (ISWC-03), CEUR-WS. org, 2003.
  17. Madeira, S. C. , & Oliveira, A. L. 2004. Biclustering algorithms for biological data analysis: a survey. Computational Biology and Bioinformatics, IEEE/ACM Transactions on, 1(1), 24-45. Chicago
  18. Scrucca, L. 2012. GA: a package for genetic algorithms in R. Journal of Statistical Software, Volume 53, Issue 4.
  19. Yeast expression matrix, URL accessed on 02-01-2014: http://arep. med. harvard. edu/biclustering/yeast. matrix
Index Terms

Computer Science
Information Sciences

Keywords

Scalability two-mode clustering parallel genetic algorithm matrix reordering