CFP last date
20 May 2024
Reseach Article

Two Phase Integrated Rule based Model (TPC-IRBM) for Clustering of Gene Expression Data of CA1 Region of Rat Hippocampus

by Sudhakar Tripathi, R. B. Mishra
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 84 - Number 6
Year of Publication: 2013
Authors: Sudhakar Tripathi, R. B. Mishra
10.5120/14580-2803

Sudhakar Tripathi, R. B. Mishra . Two Phase Integrated Rule based Model (TPC-IRBM) for Clustering of Gene Expression Data of CA1 Region of Rat Hippocampus. International Journal of Computer Applications. 84, 6 ( December 2013), 23-29. DOI=10.5120/14580-2803

@article{ 10.5120/14580-2803,
author = { Sudhakar Tripathi, R. B. Mishra },
title = { Two Phase Integrated Rule based Model (TPC-IRBM) for Clustering of Gene Expression Data of CA1 Region of Rat Hippocampus },
journal = { International Journal of Computer Applications },
issue_date = { December 2013 },
volume = { 84 },
number = { 6 },
month = { December },
year = { 2013 },
issn = { 0975-8887 },
pages = { 23-29 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume84/number6/14580-2803/ },
doi = { 10.5120/14580-2803 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:00:13.167868+05:30
%A Sudhakar Tripathi
%A R. B. Mishra
%T Two Phase Integrated Rule based Model (TPC-IRBM) for Clustering of Gene Expression Data of CA1 Region of Rat Hippocampus
%J International Journal of Computer Applications
%@ 0975-8887
%V 84
%N 6
%P 23-29
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper propose a semi supervised clustering model TPC-IRBM(Two phase clustering-Integrated rule based model) for clustering large data set such as gene expression data. TPC-IRBM works in two phases to cluster the gene expression data set. The proposed model is based on rule based models CRT,C5,CHAID and QUEST. In the first phase of the model 30 % data(which may vary) is extracted to prepare training, testing and validation data (TTV data)using suitable heuristic or neural network based clustering techniques. The output of first phase is used as build the models and generate the rule base fitting to TTV data using aforesaid models. The proposed model is then constructed by selecting and integrating the quality rules of various models using qualifying criteria corresponding to every cluster. The number of quality rules in proposed model is much more compared to that of CRT,C5,CHAID and QUEST. The performance in terms of accuracy is better compared to the models. Although in some cases Neural Network based models performance is slightly better but a very high cost of complexity for very large data set.

References
  1. Breiman, L. , J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. New York: Chapman & Hall/CRC.
  2. Quinlan, J. R. C4. 5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.
  3. Quinlan, J. (1996). Bagging, Boosting, and C4. 5, Proceedings of the Thirteenth National Conferenceon Artificial Intelligence, Portland , Oregon (American Association for Artificial Intelligence Press, Menlo Park, California), pp. 725 – 730.
  4. Rulequest Research. (2013) See5/c5. 0. [Online]. Available: http://www. rulequest. com/see5-info. html.
  5. Kass, Gordon V. ; An Exploratory Technique for Investigating Large Quantities of Categorical Data, Applied Statistics, Vol. 29, No. 2 (1980), pp. 119–127.
  6. Loh, W. Y. , and Y. S. Shih. 1997. Split selection methods for classification trees. Statistica
  7. www. ncbi. nlm. nih. gov
  8. Clementine® 11. 1 Algorithms Guide , Copyright © 2007 by Integral Solutions Limited.
  9. Lander E. S. , "Array of hope," Nature genet. ,21,3-4,1999.
  10. Eisen,M. B. et al. , " Cluster analysis and display of genome-wide expression patterns," Proc. Natl. Acad. Sci. Am. , 95, 14863–14868, 1998. .
  11. Tavazoie,S. et al. , " Systematic determination of genetic network architecture," Nat. Genet. , 22, 281–285, 1999. .
  12. Tamayo P. et al. , "Interpreting patterns of gene Expression with self organizing maps: methods and application to hematopoieitic differentiation," Proc. Natl acad. Sci. USA,96,2907-2912,1999.
  13. Ben-Dor,A. and Yakhini Z. , "clustering gene Expression patterns," inRECOMB99: Proceedings of the third annual international conference on computational molecular biology. Lyon,france,1999,pp. 33-42.
  14. Hartuv E. et al. , "An algorithm for clustering cDNAs for gene expression analysis," in RECOMB99: Proceedings of the third annual international conference on computational molecular biology. Lyon,france,1999, pp. 188-197.
  15. Browm M. P. S. , et al. , "Knowledge based analysis of micro array gene expression data using support vector machine," Proc. Natl acad. Sci. USA,97,262-267, 2000.
  16. Yeung K. Y. ,et al. , "Model based clustering and data transformations for gene expression data," Bioinformatics , Vol. 17 no. 10 2001, 2001, pp. 977-987.
  17. Moyses Nascimento, et al. , " Bayesian model based clustering of temporal gene expression using autoregressive panel data approach," Bioinformatics,Vol. 28 no. 15 2012, 2012, pp. 2004-2007.
  18. Haberman RP, Colantuoni C, Stocker AM, Schmidt AC et al. Prominent hippocampal CA3 gene expression profile in neurocognitive aging. Neurobiol Aging 2011 Sep;32(9):1678-92. PMID: 19913943
  19. C G. Piatetsky-Shapiro, T. Khabaza, S. Ramaswamy apturing Best Practice for Microarray Gene Expression Data Analysis, , in Proceedings of KDD-2003 (ACM Conference on Knowledge Discovery and Data Mining), Washington, D. C. , 2003.
  20. Gregory Piatetsky-Shapiro and Pablo Tamayo Microarray Data Mining: Facing the Challenges, SIGKDD Explorations, Dec 2003.
Index Terms

Computer Science
Information Sciences

Keywords

Gene expression clustering semi supervised clustering integrated rule based model two phase clustering CA1 region gene expression clustering of rat hippocampus.