CFP last date
22 July 2024
Reseach Article

Microarrays Data Analysis for Cancer Disease on a Cluster of Computers

by Amal Khalifa, Dina Elsayad
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 95 - Number 20
Year of Publication: 2014
Authors: Amal Khalifa, Dina Elsayad

Amal Khalifa, Dina Elsayad . Microarrays Data Analysis for Cancer Disease on a Cluster of Computers. International Journal of Computer Applications. 95, 20 ( June 2014), 13-20. DOI=10.5120/16709-6864

@article{ 10.5120/16709-6864,
author = { Amal Khalifa, Dina Elsayad },
title = { Microarrays Data Analysis for Cancer Disease on a Cluster of Computers },
journal = { International Journal of Computer Applications },
issue_date = { June 2014 },
volume = { 95 },
number = { 20 },
month = { June },
year = { 2014 },
issn = { 0975-8887 },
pages = { 13-20 },
numpages = {9},
url = { },
doi = { 10.5120/16709-6864 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2024-02-06T22:19:56.545090+05:30
%A Amal Khalifa
%A Dina Elsayad
%T Microarrays Data Analysis for Cancer Disease on a Cluster of Computers
%J International Journal of Computer Applications
%@ 0975-8887
%V 95
%N 20
%P 13-20
%D 2014
%I Foundation of Computer Science (FCS), NY, USA

Clustering problem is one of the hottest research fields in microarrays data analysis. In Clustering, a set of observations are assigned into subsets (called clusters) such that observations in the same cluster are similar in some sense. One of the clustering approaches is based on the minimum spanning tree (MST). The MST-based clustering techniques consist of three main phases; MST construction, inconsistent edges identification and clusters identification. The CLUMP algorithm (Clustering through Minimum spanning tree in parallel) is one of the MST-based clustering algorithms, which have been enhanced in the iCLUMP algorithm was improved using the cover tree data structure. This paper presents another improvement called iCLUMP-2 to enhance the edge inconsistency measure employed by both CLUMP and iCLUMP. The performance of the implemented algorithm was tested on a 45 nodes cluster using cancer microarrays data sets. The results showed that the proposed algorithm outperformed both CLUMP and iCLUMP providing better speedup and efficiency. Furthermore the quality of cluster produced by the iCLUMP-2 algorithm is much better that those produced by both CUMP and iCLUMP.

  1. Aluru, S. Handbook of computational molecular biology. CRC Press, 2006.
  2. Culf, A. S. and Cuperlovic-Culf, M. and Ouellette, R. J. Carbohydrate microarrays: survey of fabrication techniques. OMICS: A Journal of Integrative Biology 2006; 10(3): 289-310.
  3. Schena, M. and Shalon, D. and Davis, R. W. and Brown, P. O. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995; 270(5235): 467-470.
  4. Meenakshisundaram, K. and Carmen, L. and Michela, B. and Diego, D. B. and Rosaria, V. and Gabriella, M. Existence of snoRNA, microRNA, piRNA characteristics in a novel non-coding RNA: x-ncRNA and its biological implication in Homo sapiens. Journal of Bioinformatics and Sequence Analysis 2009; 1(2): 31-40.
  5. Stoevesandt, O. and Taussig, M. J. and He, M. Protein microarrays: high-throughput tools for proteomics. Expert Review of Proteomics 2009; 6(2): 145-157.
  6. Camp, R. L. Charette, L. A. Rimm, D. L. Validation of Tissue Microarray Technology in Breast Carcinoma. LABORATORY INVESTIGATION 2000; 80(12): 1943-1949.
  7. Chen, D. S. and Davis, M. M. Cellular immunotherapy: Antigen recognition is just the beginning. Springer seminars in immunopathology 2005; 27(1): 199-127.
  8. Ma, H. and Horiuchi, K. Y. Chemical microarray: a new tool for drug screening and discovery. Drug discovery today 2006; 11(13): 661-668.
  9. Rivas, L. A. and Garci?a-Villadangos, M. and Moreno-Paz, M. and Cruz-Gil, P. and Go?mez-Elvira, J. and Parro, V. A 200-Antibody Microarray Biochip for Environmental Monitoring: Searching for Universal Microbial Biomarkers through Immunoprofiling. Analytical Chemistry 2008; 80(21): 7970-7979.
  10. Li, S. and Li, D. DNA microarray technology and data analysis in cancer research. World Scientific Pub Co Inc, 2008.
  11. Yang, Y. and Choi, J. Y. and Choi, K. and Pierce, M. and Gannon, D. and Kim, S. BioVLAB-Microarray: Microarray Data Analysis in Virtual Environment. IEEE Fourth International Conference on eScience, 2008, 159-165.
  12. D. Dembele and P. Kanstner. Fuzzy C-means method for clustering microarray data. Bioinformatics 2003; 19(1): 973-980.
  13. Ivan G. Costa, Francisco de A. T. de Carvalho and Marcilio C. P. de Souto. Comparative Analysis of Clustering Methods for Gene Expression Time Course Data. Genetics and Molecular Biology 2004; 27(4): 623-631.
  14. Carlos Cotta, Pablo Moscato. A memetic-aided approach to hierarchical clustering from distance matrices: application to gene expression clustering and phylogeny. Biosystems 2003; 72(1): 75-97.
  15. Sudip Seal, Srikanth Komrina, Srinivas Aluru. An optimal hierarchical clustering algorithm for gene expression data. Information Processing Letters, 2004; 39(3): 143-147.
  16. C. M. Bishop. Neural Networks for Pattern Recognition. Oxford Univ. Press, 1995.
  17. Kanungo, S. and Sahoo, G. and Gore, M. M. A Co-Clustering Technique for Gene Expression Data Using Bi-Partite Graph Approach. International Conference on Bioinformatics and Biomedical Engineering 2010; 1-5.
  18. De Bin, R. and Risso, D. A novel approach to the clustering of microarray data via nonparametric density estimation. BMC bioinformatics 2011; 12(1): 49-56.
  19. Olman, V. and Mao, F. and Wu, H. and Xu, Y. Parallel clustering algorithm for large data sets with applications in bioinformatics. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2009; 6(2): 344-352.
  20. Elsayad, D. Khalifa, A. Khalifs, M. E, El-Horbaty, E. -S. An improved parallel minimum spanning tree based clustering algorithm for microarrays data analysis. 8th International Conference on Informatics and Systems (INFOS), May 2012; 66-72.
  21. Kerr, G. and Ruskin, H. J. and Crane, M. and Doolan, P. Techniques for clustering gene expression data. Computers in biology and medicine 2008; 38(3): 283-293.
  22. K. H. Rosen. Handbook of Discrete and Combinatorial Mathematics. CRC Press, 1999.
  23. Jana, PK and Naik, A. An efficient minimum spanning tree based clustering algorithm. Proceeding of International Conference on Methods and Models in Computer Science 2009; 1-5.
  24. Zhong, C. and Miao, D. and Wang, R. A graph-theoretical clustering method based on two rounds of minimum spanning trees. Pattern Recognition 2010; 43(3): 752-766.
  25. Zhao, W. L. and Zhang, Z. G. An Improved Algorithm for Clustering Gene Expression Data Using Minimum Spanning Trees. Applied Mechanics and Materials 2010; 29(1): 2656-2661.
  26. XY. Xu, V. Olman, and D. Xu. Clustering Gene Expression Data Using a Graph-Theoretic Approach: An Application of Minimum Spanning Tree. Bioinformatics 2001; 18(4): 526-535.
  27. Wang, G. W. and Zhang, C. X. and Zhuang, J. and Yu, D. H. Clustering based on sequential representation of minimum spanning tree. International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR), 2011.
  28. William B. March and Parikshit Ram and Alexander G. Gray. Fast Euclidean minimum spanning tree: algorithm, analysis, and applications. In Proceedings of KDD 2010; 603-612.
  29. D. Karger and M. Ruhl. Finding nearest neighbors in growth restricted metrics. Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC) 2002; 741–750.
Index Terms

Computer Science
Information Sciences


Clustering microarrays cancer parallel algorithm minimum spanning tree