Optimization of Clustering Algorithms for Gene Expression Data Analysis using Distance Measures

Angela Makolo; Taiwo Adigun

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

Optimization of Clustering Algorithms for Gene Expression Data Analysis using Distance Measures

by Angela Makolo, Taiwo Adigun

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 139 - Number 13

Year of Publication: 2016

Authors: Angela Makolo, Taiwo Adigun

10.5120/ijca2016909413

Angela Makolo, Taiwo Adigun . Optimization of Clustering Algorithms for Gene Expression Data Analysis using Distance Measures. International Journal of Computer Applications. 139, 13 ( April 2016), 4-8. DOI=10.5120/ijca2016909413

@article{ 10.5120/ijca2016909413,

author = { Angela Makolo, Taiwo Adigun },

title = { Optimization of Clustering Algorithms for Gene Expression Data Analysis using Distance Measures },

journal = { International Journal of Computer Applications },

issue_date = { April 2016 },

volume = { 139 },

number = { 13 },

month = { April },

year = { 2016 },

issn = { 0975-8887 },

pages = { 4-8 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume139/number13/24548-2016909413/ },

doi = { 10.5120/ijca2016909413 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:40:49.801541+05:30

%A Angela Makolo

%A Taiwo Adigun

%T Optimization of Clustering Algorithms for Gene Expression Data Analysis using Distance Measures

%J International Journal of Computer Applications

%@ 0975-8887

%V 139

%N 13

%P 4-8

%D 2016

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Clustering is one of the fundamental processes of analyzing gene expression data, basically by comparing gene expression profiles or sample expression profiles. Comparing expression profiles requires a measure apart from the actual clustering algorithm to quantify how similar or dissimilar the objects under consideration are. Various clustering algorithms have been used to analyze gene expression data. Some of these algorithms reported the incorporation of similarity measures like Euclidean Distance, Pearson Correlation and mutual information for their performance. This work considered different reported clustering algorithms for gene expression data analyses and the importance of different similarity measures for optimizing these clustering algorithms. To this end, no clustering technique in all the works investigated has been applied directly on gene expression data. It is observed that the output (distance matrix) of similarity or dissimilarity measures plays the role of input to clustering techniques, and those that did not use any of the popular proximity measures applied one or two approaches such as Constrained Coherency (CoCo), Silhouette coefficient measurement, and normalization and discretization, to refine gene expression data for improved cluster quality by speeding up the learning phase, reduction of computational space and handling of noise effectively.

References

Brian T. (2006).An approach for clustering gene expression data with error information. BMC Bioinformatics, 7:17, doi:10.1186/1471-2105-7-17.
Pirim, H., Ekşioğlu, B., Perkins, A. and Yüceer, C. (2012) Clustering of High Throughput Gene Expression Data. Comput Oper Res., 39(12): 3046–3061. doi:10.1016/j.cor.2012.03.008.
Ernst, J., Nau, G.J.and Bar-Joseph, Z. (2005) Clustering short time series gene expression data. BIOINFORMATICS, Vol. 21 Suppl. 1, pages i159–i168, doi:10.1093/bioinformatics/bti1022
An, L. and Doerge, R. W. (2012) Dynamic Clustering of Gene Expression. International Scholarly Research Network ISRN Bioinformatics, Volume 2012, Article ID 537217, doi:10.5402/2012/537217
Chandrasekhar, T., Thangavel, K. and Elayaraja, E. (2011) Effective Clustering Algorithms for Gene Expression Data. International Journal of Computer Applications (0975 – 8887), Volume 32– No.4.
Sturn, A., Quackenbush, J. and Trajanoski, Z. () Genesis: cluster analysis of microarray data. BIOINFORMATICS APPLICATIONS NOTE, Vol. 18 no. 1, Pages 207–208.
Chandrasekhar, T., Thangavel, K. and Elayaraja, E. (2011) Performance Analysis of Enhanced Clustering Algorithm for Gene Expression Data. IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 6, No 3, ISSN (Online): 1694-0814.
Valarmathie, P., Srinath, M.V., Ravichandran, T. and Dinakaran, K (2009).Hybrid Fuzzy C-Means Clustering Technique for Gene Expression Data. International Journal of Research and Reviews in Applied Sciences, ISSN: 2076-734X, EISSN: 2076-7366, Volume 1, Issue 1.
Sarmah, S. and Bhattacharyya, D.K. (2010) An Effective Technique for Clustering Incremental Gene Expression data. IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 3, No 3.
Yeung, K. Y. and Ruzzo, W. L. (2001) Principal component analysis for clustering gene expression data. BIOINFORMATICS, Vol. 17 no. 9, Pages 763–774.
Bryan, J. (2004) Problems in gene clustering based on gene expression data. Journal of Multivariate Analysis 90, 44–66.
Yeung, K. Y., Haynor, D. R. and Ruzzo, W. L. (2001) Validating clustering for gene expression data. BIOINFORMATICS, Vol. 17 no. 4, Pages 309–318.
Das, R., Bhattacharyya, D. K. and Kalita, J. K. (2010) CLUSTERING GENE EXPRESSION DATA USING AN EFFECTIVE DISSIMILARITY MEASURE. International Journal of Computational Bioscience, Vol. 1, No. 1.
Yeung, K.Y., Medvedovic, M. and Bumgarner, R.E. (2003) Clustering gene-expression data with repeated measurements. Genome Biology, Volume 4, Issue 5, Article R34.
Hestilow, T.J. and Huang, Y. (2009) Clustering of Gene Expression Data Based on Shape Similarity. EURASIP Journal on Bioinformatics and Systems Biology, Volume 2009, Article ID 195712, doi:10.1155/2009/195712.
Balasubramaniyan, R., Hullermeier, E., Weskamp, N. and Kamper, J. (2004) Clustering of Gene Expression Data Using a Local Shape-Based Similarity Measure. Bioinformatics © Oxford University Press.
Hanisch, D., Zien, A., Zimmer, R. and Lengauer, T. (2002) Co-clustering of biological networks and gene expression data. BIOINFORMATICS, Vol. 18 Suppl. 1, Pages S145–S154.
Souto, M., Costa, I., Araujo, D., Ludermir, T. and Schliep, A. (2008) Clustering cancer gene expression data: a comparative study. BMC Bioinformatics, 9:497, doi:10.1186/1471-2105-9-497.
Liu, J., Mohammed, J., Carter, J., Ranka, S., Kahveci., T. and Baudis, M. (2006) Distance-based clustering of CGH data. BIOINFORMATICS, Vol. 22 no. 16, pages 1971–1978, doi:10.1093/bioinformatics/btl185.
Priness, I., Maimon, O and Ben-Gal, I. (2007) Evaluation of gene-expression clustering via mutual information distance measure. BMC Bioinformatics, 8:111, doi:10.1186/1471-2105-8-111.
Jaskowiak, P., Campello, R. and Costa, I. (2014) On the selection of appropriate distances for gene expression data clustering. BMC Bioinformatics, 15(Suppl 2):S2.
Glazko, G. and Mushegian, A. (2010) Measuring gene expression divergence: the distance to keep. Biology Direct, 5:51.
Ray, S.S., Bandyopadhyay, S. and Pal, S.K. (2007) New Distance Measure for Microarray Gene Expressions using Linear Dynamic Range of Photo Multiplier Tube. Proceedings of the International Conference on Computing: Theory and Applications (ICCTA'07), 0-7695-2770-1/07.
Salome, J. and Suresh, R. M. (2012) Efficient Clustering for Gene Expression Data. International Journal of Computer Applications (0975 – 888), Volume 47– No.5.
Adigun, T., Makolo, A. and Fatumo, S. (2015) Input Dataset Survey of In-Silico Tools for Inference and Visualization of Gene Regulatory Networks (GRN). Computational Biology and Bioinformatics. Vol. 3, No. 6, pp. 81-87. doi: 10.11648/j.cbb.20150306.11.

Index Terms

Computer Science

Information Sciences

Keywords

Clustering Algorithms Proximity Measure Gene Expression Data Distance Matrix Microarray.