CFP last date
20 May 2024
Reseach Article

Iterative Search with Incremental MSR Difference Threshold for Biclustering Gene Expression Data

by Shyama Das, Sumam Mary Idicula
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 1 - Number 18
Year of Publication: 2010
Authors: Shyama Das, Sumam Mary Idicula
10.5120/385-576

Shyama Das, Sumam Mary Idicula . Iterative Search with Incremental MSR Difference Threshold for Biclustering Gene Expression Data. International Journal of Computer Applications. 1, 18 ( February 2010), 35-43. DOI=10.5120/385-576

@article{ 10.5120/385-576,
author = { Shyama Das, Sumam Mary Idicula },
title = { Iterative Search with Incremental MSR Difference Threshold for Biclustering Gene Expression Data },
journal = { International Journal of Computer Applications },
issue_date = { February 2010 },
volume = { 1 },
number = { 18 },
month = { February },
year = { 2010 },
issn = { 0975-8887 },
pages = { 35-43 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume1/number18/385-576/ },
doi = { 10.5120/385-576 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T19:46:41.031583+05:30
%A Shyama Das
%A Sumam Mary Idicula
%T Iterative Search with Incremental MSR Difference Threshold for Biclustering Gene Expression Data
%J International Journal of Computer Applications
%@ 0975-8887
%V 1
%N 18
%P 35-43
%D 2010
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The goal of biclustering in a gene expression data matrix is to find a submatrix such that the genes in the submatrix show highly correlated activities across all conditions in the submatrix. A measure called Mean Squared Residue (MSR) is used to simultaneously evaluate the coherence of rows and columns within a submatrix. In this paper a new method for biclustering gene expression data is developed. In the first step high quality bicluster seeds are generated using K-Means clustering algorithm. Then more genes and conditions (node) are added to the bicluster. Before adding a node the MSR X of the bicluster is calculated. After adding the node again the MSR Y is calculated. The added node is deleted if Y minus X is greater than MSR difference threshold or if Y is greater than d (MSR threshold) which depends on the dataset. The MSR difference threshold is different for gene list and condition list and it depends on the dataset also. Proper values should be identified through experimentation in order to obtain biclusters of large size. Since it is very difficult to calculate the value of MSR difference threshold, in this algorithm an iterative search is used where MSR difference threshold is initialized with a small value and it is incremented after each iteration. A bicluster is obtained from Yeast dataset with a unique structural appearance. This proves that the newly introduced concept of MSR difference threshold will result in high quality biclusters. The results obtained on bench mark datasets prove that this algorithm is better than many of the existing biclustering algorithms.

References
  1. J. A. Hartigan, "Direct clustering of Data Matrix", Journal of the American Statistical Association Vol.67, no.337, pp. 123-129, 1972.
  2. Yizong Cheng and George M. Church, "Biclustering of expression data", Proc. 8th Int. Conf. Intelligent Systems for Molecular Biology, pp. 93-103, 2000.
  3. Madeira S. C. and Oliveira A. L., "Biclustering algorithms for Biological Data analysis: a survey" IEEE Transactions on computational biology and bioinformatics, pp. 24-45, 2004.
  4. Anupam Chakraborty and Hitashyam Maka "Biclustering of Gene Expression Data Using GeneticAlgorithm" Proceedings of Computation Intelligence in Bioinformatics and Computational Biology CIBCB, pp. 1-8, 2005.
  5. Chakraborty A. and Maka H., "Biclustering of gene expression data by simulated annealing", HPCASIA '05, pp. 627-632, 2005.
  6. Shyama Das and Sumam Mary Idicula "A Novel Approach in Greedy Search Algorithm for Biclustering Gene Expression Data" International Conference on Bioinformatics, Computational and Systems Biology (ICBCSB), WASET, 2009.
  7. Tavazoie S., Hughes J. D., Campbell M. J., Cho R. J. and Church G. M., "Systematic determination of genetic network architecture", Nat. Genet., vol.22, no.3 pp. 281-285, 1999.
  8. Alizadeh, A. A. et al., "Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling", Nature Vol.43,no. 6769, pp. 503-11, 2000.
  9. Amos Tanay, Roded Sharan and Ron Shamir, "Discovering Statistically significant Biclusters in Gene Expression Data," Bioinformatics; vol.18 Suppl 1, pp.S136-44,2000.
  10. SGD GO Termfinder [http://db.yeastgenome.org/cgi bin/ GO/ goTermFinder]
  11. Federico Divina and Jesus S. Aguilar-Ruize, "Biclustering of Expression Data with Evolutionary computation", IEEE Transactions on Knowledge and Data Engineering, Vol. 18, pp. 590-602, 2006.
  12. J. Yang, H. Wang, W. Wang and P. Yu, "Enhanced Biclustering on Expression Data", Proc. Third IEEE Symp. BioInformatics and BioEng. (BIBE'03), pp. 321-327, 2003.
  13. Z. Zhang, A. Teo, B. C. Ooi, K. L. Tan, "Mining deterministic biclusters in gene expression data", In: Proceedings of the fourth IEEE Symposium on Bioinformatics and Bioengineering(BIBE'04), 2004, pp. 283-292, 2004.
  14. Banka H. and Mitra S., "Multi-objective evolutionary biclustering of gene expression data", Journal of Pattern Recognition, Vol.39 pp. 2464-2477, 2006.
  15. Junwan Liu, Zhoujun Lia and Feifei Liu "Multi-objective Particle Swarm Optimization Biclustering of Microarray Data", IEEE International Conference on Bioinformatics and Biomedicine, pp. 363-366, 2008.
Index Terms

Computer Science
Information Sciences

Keywords

Biclustering gene expression data K-Means clustering Mean Squared Residue