CFP last date
22 April 2024
Reseach Article

Optimization of Clustering Algorithms for Gene Expression Data Analysis using Distance Measures

by Angela Makolo, Taiwo Adigun
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 139 - Number 13
Year of Publication: 2016
Authors: Angela Makolo, Taiwo Adigun
10.5120/ijca2016909413

Angela Makolo, Taiwo Adigun . Optimization of Clustering Algorithms for Gene Expression Data Analysis using Distance Measures. International Journal of Computer Applications. 139, 13 ( April 2016), 4-8. DOI=10.5120/ijca2016909413

@article{ 10.5120/ijca2016909413,
author = { Angela Makolo, Taiwo Adigun },
title = { Optimization of Clustering Algorithms for Gene Expression Data Analysis using Distance Measures },
journal = { International Journal of Computer Applications },
issue_date = { April 2016 },
volume = { 139 },
number = { 13 },
month = { April },
year = { 2016 },
issn = { 0975-8887 },
pages = { 4-8 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume139/number13/24548-2016909413/ },
doi = { 10.5120/ijca2016909413 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:40:49.801541+05:30
%A Angela Makolo
%A Taiwo Adigun
%T Optimization of Clustering Algorithms for Gene Expression Data Analysis using Distance Measures
%J International Journal of Computer Applications
%@ 0975-8887
%V 139
%N 13
%P 4-8
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Clustering is one of the fundamental processes of analyzing gene expression data, basically by comparing gene expression profiles or sample expression profiles. Comparing expression profiles requires a measure apart from the actual clustering algorithm to quantify how similar or dissimilar the objects under consideration are. Various clustering algorithms have been used to analyze gene expression data. Some of these algorithms reported the incorporation of similarity measures like Euclidean Distance, Pearson Correlation and mutual information for their performance. This work considered different reported clustering algorithms for gene expression data analyses and the importance of different similarity measures for optimizing these clustering algorithms. To this end, no clustering technique in all the works investigated has been applied directly on gene expression data. It is observed that the output (distance matrix) of similarity or dissimilarity measures plays the role of input to clustering techniques, and those that did not use any of the popular proximity measures applied one or two approaches such as Constrained Coherency (CoCo), Silhouette coefficient measurement, and normalization and discretization, to refine gene expression data for improved cluster quality by speeding up the learning phase, reduction of computational space and handling of noise effectively.

References
  1. Brian T. (2006).An approach for clustering gene expression data with error information. BMC Bioinformatics, 7:17, doi:10.1186/1471-2105-7-17.
  2. Pirim, H., Ekşioğlu, B., Perkins, A. and Yüceer, C. (2012) Clustering of High Throughput Gene Expression Data. Comput Oper Res., 39(12): 3046–3061. doi:10.1016/j.cor.2012.03.008.
  3. Ernst, J., Nau, G.J.and Bar-Joseph, Z. (2005) Clustering short time series gene expression data. BIOINFORMATICS, Vol. 21 Suppl. 1, pages i159–i168, doi:10.1093/bioinformatics/bti1022
  4. An, L. and Doerge, R. W. (2012) Dynamic Clustering of Gene Expression. International Scholarly Research Network ISRN Bioinformatics, Volume 2012, Article ID 537217, doi:10.5402/2012/537217
  5. Chandrasekhar, T., Thangavel, K. and Elayaraja, E. (2011) Effective Clustering Algorithms for Gene Expression Data. International Journal of Computer Applications (0975 – 8887), Volume 32– No.4.
  6. Sturn, A., Quackenbush, J. and Trajanoski, Z. () Genesis: cluster analysis of microarray data. BIOINFORMATICS APPLICATIONS NOTE, Vol. 18 no. 1, Pages 207–208.
  7. Chandrasekhar, T., Thangavel, K. and Elayaraja, E. (2011) Performance Analysis of Enhanced Clustering Algorithm for Gene Expression Data. IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 6, No 3, ISSN (Online): 1694-0814.
  8. Valarmathie, P., Srinath, M.V., Ravichandran, T. and Dinakaran, K (2009).Hybrid Fuzzy C-Means Clustering Technique for Gene Expression Data. International Journal of Research and Reviews in Applied Sciences, ISSN: 2076-734X, EISSN: 2076-7366, Volume 1, Issue 1.
  9. Sarmah, S. and Bhattacharyya, D.K. (2010) An Effective Technique for Clustering Incremental Gene Expression data. IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 3, No 3.
  10. Yeung, K. Y. and Ruzzo, W. L. (2001) Principal component analysis for clustering gene expression data. BIOINFORMATICS, Vol. 17 no. 9, Pages 763–774.
  11. Bryan, J. (2004) Problems in gene clustering based on gene expression data. Journal of Multivariate Analysis 90, 44–66.
  12. Yeung, K. Y., Haynor, D. R. and Ruzzo, W. L. (2001) Validating clustering for gene expression data. BIOINFORMATICS, Vol. 17 no. 4, Pages 309–318.
  13. Das, R., Bhattacharyya, D. K. and Kalita, J. K. (2010) CLUSTERING GENE EXPRESSION DATA USING AN EFFECTIVE DISSIMILARITY MEASURE. International Journal of Computational Bioscience, Vol. 1, No. 1.
  14. Yeung, K.Y., Medvedovic, M. and Bumgarner, R.E. (2003) Clustering gene-expression data with repeated measurements. Genome Biology, Volume 4, Issue 5, Article R34.
  15. Hestilow, T.J. and Huang, Y. (2009) Clustering of Gene Expression Data Based on Shape Similarity. EURASIP Journal on Bioinformatics and Systems Biology, Volume 2009, Article ID 195712, doi:10.1155/2009/195712.
  16. Balasubramaniyan, R., Hullermeier, E., Weskamp, N. and Kamper, J. (2004) Clustering of Gene Expression Data Using a Local Shape-Based Similarity Measure. Bioinformatics © Oxford University Press.
  17. Hanisch, D., Zien, A., Zimmer, R. and Lengauer, T. (2002) Co-clustering of biological networks and gene expression data. BIOINFORMATICS, Vol. 18 Suppl. 1, Pages S145–S154.
  18. Souto, M., Costa, I., Araujo, D., Ludermir, T. and Schliep, A. (2008) Clustering cancer gene expression data: a comparative study. BMC Bioinformatics, 9:497, doi:10.1186/1471-2105-9-497.
  19. Liu, J., Mohammed, J., Carter, J., Ranka, S., Kahveci., T. and Baudis, M. (2006) Distance-based clustering of CGH data. BIOINFORMATICS, Vol. 22 no. 16, pages 1971–1978, doi:10.1093/bioinformatics/btl185.
  20. Priness, I., Maimon, O and Ben-Gal, I. (2007) Evaluation of gene-expression clustering via mutual information distance measure. BMC Bioinformatics, 8:111, doi:10.1186/1471-2105-8-111.
  21. Jaskowiak, P., Campello, R. and Costa, I. (2014) On the selection of appropriate distances for gene expression data clustering. BMC Bioinformatics, 15(Suppl 2):S2.
  22. Glazko, G. and Mushegian, A. (2010) Measuring gene expression divergence: the distance to keep. Biology Direct, 5:51.
  23. Ray, S.S., Bandyopadhyay, S. and Pal, S.K. (2007) New Distance Measure for Microarray Gene Expressions using Linear Dynamic Range of Photo Multiplier Tube. Proceedings of the International Conference on Computing: Theory and Applications (ICCTA'07), 0-7695-2770-1/07.
  24. Salome, J. and Suresh, R. M. (2012) Efficient Clustering for Gene Expression Data. International Journal of Computer Applications (0975 – 888), Volume 47– No.5.
  25. Adigun, T., Makolo, A. and Fatumo, S. (2015) Input Dataset Survey of In-Silico Tools for Inference and Visualization of Gene Regulatory Networks (GRN). Computational Biology and Bioinformatics. Vol. 3, No. 6, pp. 81-87. doi: 10.11648/j.cbb.20150306.11.
Index Terms

Computer Science
Information Sciences

Keywords

Clustering Algorithms Proximity Measure Gene Expression Data Distance Matrix Microarray.