CFP last date
22 April 2024
Call for Paper
May Edition
IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 22 April 2024

Submit your paper
Know more
Reseach Article

Hybrid Correlation based Gene Selection for Accurate Cancer Classification of Gene Expression Data

by Vibhav Prakash Singh, Singh Gaurav Arvind, Arindam G Mahapatra
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 43 - Number 14
Year of Publication: 2012
Authors: Vibhav Prakash Singh, Singh Gaurav Arvind, Arindam G Mahapatra
10.5120/6170-8591

Vibhav Prakash Singh, Singh Gaurav Arvind, Arindam G Mahapatra . Hybrid Correlation based Gene Selection for Accurate Cancer Classification of Gene Expression Data. International Journal of Computer Applications. 43, 14 ( April 2012), 13-18. DOI=10.5120/6170-8591

@article{ 10.5120/6170-8591,
author = { Vibhav Prakash Singh, Singh Gaurav Arvind, Arindam G Mahapatra },
title = { Hybrid Correlation based Gene Selection for Accurate Cancer Classification of Gene Expression Data },
journal = { International Journal of Computer Applications },
issue_date = { April 2012 },
volume = { 43 },
number = { 14 },
month = { April },
year = { 2012 },
issn = { 0975-8887 },
pages = { 13-18 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume43/number14/6170-8591/ },
doi = { 10.5120/6170-8591 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:33:23.654239+05:30
%A Vibhav Prakash Singh
%A Singh Gaurav Arvind
%A Arindam G Mahapatra
%T Hybrid Correlation based Gene Selection for Accurate Cancer Classification of Gene Expression Data
%J International Journal of Computer Applications
%@ 0975-8887
%V 43
%N 14
%P 13-18
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Microarray data has been widely applied to cancer classification, where the purpose is to classify and predict the category of a sample by its gene expression profile. DNA microarray is a gene chip which consists of expression levels for a huge number of genes on a relatively small number of samples. However, only a small number of genes contribute in accurate classification of cancer. Therefore, the challenging task is to identify a small subset of informative genes which has maximum amount of information about the class. Moreover, it also minimizes the classification errors. In this paper, we propose a hybrid negative correlated method, which combines the features from various correlation based feature selection techniques, for the generation of mutually exclusive informative feature sets. We test the effectiveness of the proposed approach using a neural network based classifier on two benchmark gene expression data sets - colon dataset and leukemia dataset. The obtained results are encouraging as hybrid negative correlated method based features give better recognition accuracy than positive correlated and other negative correlated features.

References
  1. J. C. Patra, G. P. Lim, P. K. Meher and E. L. Ang. 2007 DNA Microarray Data Analysis: Effective Feature Selection for Accurate Cancer Classification. In Proceedings of International Joint Conference on Neural Networks, pp. 260-265, August 12-17.
  2. J. Yeh, T. Wu, M. Wu, D. Chang. 2008. Applying Data Mining Techniques for Cancer Classification from Gene Expression Data. International Conference on Convergence Information Technology, 39, pp. 583–602, August 6.
  3. J. Han and M. Kamber. 2001. Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers.
  4. Y. Liu and X. Yao. 1999. Ensemble learning via negative correlation. Neural Networks, 2, pp. 1399–1404, Dec.
  5. O. H. Fang, N. Mustapha, Sulaiman. 2010. Integrating Biological Information for Feature Selection in Microarray Data Classification. International Conference on Computer Engineering & Application, pp. 330-334, March 19-21.
  6. T. R. Golub et al. 1999. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, pp. 531-537.
  7. Bazma, A. and J. Vilo. 2000. Gene expression data analysis. Federation of European Biochemical Societies (FEBS) Letters, 480, pp. 17-24.
  8. K. -J. Kim, S. -B. Cho. 2006. Ensemble classifiers based on correlation analysis for DNA microarray classification. Neurocomputing, 70, pp. 187-199, December.
  9. W. Li and Y. Yang. 2002. How many genes are needed for a discriminant microarray data analysis. In Methods of Microarray Data Analysis, Editors: S. M. Lin and K. F. Johnson, Kluwer Academic, pp. 137-150.
  10. S. -B. Cho, J. Ryu. 2002. Classifying gene expression data of cancer using classifier ensemble with mutually exclusive features. Proceeding IEEE, 90 (11), 1744–1753.
  11. T. S. Furey, N. Cristianini, N. Duffy, D. W. Bednarski, M. Schummer, D Haussler. 2000. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16(10), pp. 906–914.
  12. D. V. Nguyen and D. M. Rocke. 2002. Tumor classification by partial least squares using microarray gene expression data. Bioinformatics.
  13. L. Li, C. R. Weinberg, T. A. Darden, L. G. Pedersen. 2001. Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics, 17 (12), 1131–1142.
  14. C. A. Harrington, C. Rosenow, J. Retief. 2000. Monitoring gene expression using DNA microarrays. Curr. Opin. Microbiology, 3(3), pp. 285-291.
  15. S. -B. Cho, H. -H. Won. 2007. Cancer classification using ensemble of neural networks with multiple significant gene subsets. Applied Intelligence 26 (2007), 243–250.
  16. J. Derisi, V. Iyer, P. Brosn. 1997. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278 (1997), 680–686.
  17. M. B. Eisen, P. T. Spellman, P. O. Brown, D. Bostein. 1998. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA, 14863–14868.
  18. Bicciato, S. , Luchini, A. and D. Bello, C. 2003. PCA disjoint models for multiclass cancer analysis using gene expression data. Bioinformatics, 19, 571–578.
  19. http://microarray. princeton. edu/oncology/affydata
  20. http://www. genome. gov
  21. R. Kohavi, G. John. 1997. Wrappers for feature subset selection. Artif. Intell. , 1-2 (1997), 273-324.
Index Terms

Computer Science
Information Sciences

Keywords

Dna Microarray Classification Correlation Neural Network Backpropagation Algorithm