CFP last date
20 May 2024
Reseach Article

A Comparative Study on Bioinformatics Feature Selection and Classification

by Amal Tamer, Amr Badr
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 43 - Number 3
Year of Publication: 2012
Authors: Amal Tamer, Amr Badr
10.5120/6081-8219

Amal Tamer, Amr Badr . A Comparative Study on Bioinformatics Feature Selection and Classification. International Journal of Computer Applications. 43, 3 ( April 2012), 5-8. DOI=10.5120/6081-8219

@article{ 10.5120/6081-8219,
author = { Amal Tamer, Amr Badr },
title = { A Comparative Study on Bioinformatics Feature Selection and Classification },
journal = { International Journal of Computer Applications },
issue_date = { April 2012 },
volume = { 43 },
number = { 3 },
month = { April },
year = { 2012 },
issn = { 0975-8887 },
pages = { 5-8 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume43/number3/6081-8219/ },
doi = { 10.5120/6081-8219 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:32:24.208098+05:30
%A Amal Tamer
%A Amr Badr
%T A Comparative Study on Bioinformatics Feature Selection and Classification
%J International Journal of Computer Applications
%@ 0975-8887
%V 43
%N 3
%P 5-8
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper presents an application of supervised machine learning approaches to the classification of the colon cancer gene expression data. Established feature selection techniques based on principal component analysis (PCA), independent component analysis (ICA), genetic algorithm (GA) and support vector machine (SVM) are, for the first time, applied to this data set to support learning and classification. Different classifiers are implemented to investigate the impact of combining feature selection and classification methods. Learning classifiers implemented include K-Nearest Neighbors (KNN) and support vector machine. Results of comparative studies are provided, demonstrating that effective feature selection is essential to the development of classifiers intended for use in high dimension domains. This research also shows that feature selection helps increase computational efficiency while improving classification accuracy.

References
  1. Canul Reich, J. L. Hall, O. , Goldgof , D. , and Eschrichy, S. A. 2008 "Feature selection for microarray data by auc analysis," in IEEE SMC.
  2. Sheikhi, N. , Rahmani A. , and Veisisheikhrobat, R. 2011 "An unsupervised feature selection method based ongenetic algorithm," International Journal of Computer Science and Information Security.
  3. Y. Lu, I. Cohen, X. S. Zhou, and Q. Tian, 2007 "Feature selection using principal feature analysis," in ACM Multimedia, Augsburg, Germany.
  4. C. Shang and Q. Shen, 2005. "Aiding classification of gene expression data with feature selection: A comparative study," International Journal of Computational Intelligence Research. , vol. 1, no. 1, p. 68-76.
  5. Y. Saeys, I. Inza, and P. Larranaga, 2007 "A review of feature selection techniques in bioinformatics," Oxford journals :Bioinformatics, vol. 23, no. 19, pp. 2507–2517.
  6. C. -H. Zheng, D. Huang, X. -Z. Kong, and X. -M. Zhao,2008 "Gene expression data classification using consensus independent component analysis," Genomics Proteomics & Bioinformatics, vol. 6, no. 2, pp. 74–82.
  7. E. Alba, J. Garcia-Nieto, L. Jourdan, and E. -G. Talbi, 2007 "Gene selection in cancer classification using pso/svm and ga/svm hybrid algorithms," in IEEE Congress on Evolutionary Computation, pp. 284–290.
  8. S. Mahadevi Alladi, S. Santosh P. , V. Ravi, and U. Suryanarayana Murthy,2008 "Colon cancer prediction with genetic profiles using intelligent techniques," Biomedical Informatics Publishing Group, vol. 3, no. 2, pp. 130–133.
  9. P. Refaeilzadeh, L. Tang, and H. Liu, 2008 "Cross-validation," Arizona State University, Tech. Report
  10. D. Pyle, 1999, Data preparation for data mining. California,USA: Morgan Kaufmanns.
  11. R. Jensen and Q. Shen, 2004 "Semantics-preserving dimensionality reduction: rough and fuzzy-rough approaches. " IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 12, pp. 1457–1471.
Index Terms

Computer Science
Information Sciences

Keywords

Hold Out Pca Svm Knn Ica Features Classification Feature Selection Accuracy Colon Cancer