Call for Paper - August 2022 Edition
IJCA solicits original research papers for the August 2022 Edition. Last date of manuscript submission is July 20, 2022. Read More

MGS-CM: A Multiple Scoring Gene Selection Technique for Cancer Classification using Microarrays

Print
PDF
International Journal of Computer Applications
© 2011 by IJCA Journal
Volume 36 - Number 6
Year of Publication: 2011
Authors:
Dina Ahmed Salem
Rania Ahmed Abul
Hesham Arafat Ali Seoud
10.5120/4498-6349

Dina Ahmed Salem, Rania Ahmed Abul Seoud and Hesham Arafat Ali. Article: MGS-CM: A Multiple Scoring Gene Selection Technique for Cancer Classification using Microarrays. International Journal of Computer Applications 36(6):30-37, December 2011. Full text available. BibTeX

@article{key:article,
	author = {Dina Ahmed Salem and Rania Ahmed Abul Seoud and Hesham Arafat Ali},
	title = {Article: MGS-CM: A Multiple Scoring Gene Selection Technique for Cancer Classification using Microarrays},
	journal = {International Journal of Computer Applications},
	year = {2011},
	volume = {36},
	number = {6},
	pages = {30-37},
	month = {December},
	note = {Full text available}
}

Abstract

Microarray is a rich topic which gives the opportunity for researchers to classify cancer samples without any previous biological knowledge. Microarrays high dimensionality characteristic motivated the importance of gene selection techniques. In this paper a new filter multiple scoring gene selection technique MGS-CM is proposed. This technique is further combined with three classifiers to introduce three new classification systems (MGS-SVM, MGS-KNN and MGS-LDA) which are validated and evaluated on three microarray datasets. The proposed MGS-CM technique was proven to be an efficient technique as it extracts the highly informative genes reducing the original datasets by at least 99.6%. Also two of the three proposed classification systems guaranteed the perfect classification (100%) of the leukemia microarray samples. The third one classifies the lymphoma microarray samples with only two misclassifications which is the minimum recorded number. The proposed systems achieved very good results and guaranteed reliable classification for new unclassified samples.

References

  • J. Derisi, V. Iyer, and P. Brosn, "Exploring the metabolic and genetic control of gene expression on a genomic scale", Science 278:680-686,1997.
  • C. Kong, J. Yu, F. Minion, K. Rajan, "Identification of Biologically Significant Genes from Combinatorial Microarray Data", ACS Combinatorial Science, 2011.
  • Larose, D. T. 2005 Discovering knowledge in Data: An Introduction to Data Mining. John Wiley & Sons, Inc.
  • E. Simoudis, "Reality check for data mining", IEEE Expert, 26-33, 1996.
  • G. Piatetsky-Shapiro and P. Tamayo, "Microarray Data Mining: Facing the Challenges", SIGKDD Explorations, 5: 1-5, 2004.
  • H. Ong, N. Mustapha, M. Sulaiman, Integrative Gene Selection for Classification of Microarray Data", 4(2):55-63, 2011.
  • T. Golub et al., "Molecular classification of cancer: class discovery and class prediction by gene expression monitoring", Science, 286:531–537,1999.
  • E. Moler, M. Chow, I. Mian, "Analysis of molecular profile data using generative and discriminative methods", Physiological Genomics , 4(2):109-126,2000.
  • Jaeger, J., Sengupta R., and Ruzzo,W. 2003. Improved gene selection for classification of microarrays. Pacific Symposium on Biocomputing. pp. 53-64.
  • Li, S., Liao, C., and Kwok, J. T. 2006.Wavelet-Based Feature Extraction for Microarray Data Classification. Presented at the International Joint Conference on Neural Networks, Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada.
  • J. Zhang, H. Deng, "Gene selection for classification of microarray data based on the Bayes error", BMC Bioinformatics, 8(1):370, 2007.
  • Yang, P., and Zhang, Z. 2007. Hybrid Methods to Select Informative Gene Sets in Microarray Data Classification. In Proceedings of the Australian Conference on Artificial Intelligence. Verlag Berlin Heidelberg: pp.810-814.
  • J. Salome, R. Suresh, "An Effective Classification Technique for Microarray Gene Expression by Blending of LPP and SVM", Medwell Journals : Asian Journal of Information Technology, 10(4):142-148, 2011.
  • K. Seeja, and Shweta, "Microarray Data Classification Using Support Vector Machine", International Journal of Biometrics and Bioinformatics (IJBB), 5(1):10-15, 2011.
  • S.-B. CHO, H.-H. WON, "Data Mining for Gene Expression Profiles from DNA Microarray", International Journal of Software Engineering and Knowledge Engineering, 13(6):593-608, 2003.
  • C. Lai, M. J. Reinders, L. J. van’t Veer, and L. F. Wessels, "A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets", BMC Bioinformatics, vol. 7:235, 2006.
  • M. A. Shipp et al., "Diffuse Large B-Cell Lymphoma Outcome Prediction by Gene Expresssion Profiling and Supervised Machine Learning", Nature Medicine, 8(1):68-74, 2001.
  • U. Alon et al., "Broad patterns of gene expression revealed by clustering of tumor and normal colon tissues probed by oligonucleotide arrays," Proceedings of the National Academy of Sciences of the United States of America,1999, vol. 96, pp. 6745-6750.
  • Huerta, E. B., Duval, B., and Hao, J.-k. 2006. A hybrid GA/SVM approach for gene selection and classification of microarray data. In Proceedings of the EvoWorkshops, LNCS 3907.pp. 34-44.
  • I. Guyon, A. e. Elisseeff, "An Introduction to Variable and Feature Selection", Journal of Machine Learning Research, 3:1157-1182, 2003.
  • Y. Wanga, I. V. Tetkoa, M. A. Hallb, E. Frankb, A. Faciusa, K. F. X. Mayera, and H. W. Mewesa, "Gene selection from microarray data for cancer classification", Computational Biology and Chemistry, 29(1):37-46,2005.
  • D. Mishra and B. Sahu, "Feature Selection for Cancer Classification: A Signal-to-noise Ratio Approach", International Journal of Scientific & Engineering Research, vol. 2, 2011.
  • Duda, R. O. and Hart, P. E. 1973 Pattern Classification and scene analysis. Wiley.
  • Hernandez, J. C., Duval, B. and Hao, J.-K. 2007. A Genetic Embedded Approach for Gene Selection and Classification of Microarray Data. In proceedings of EvoBIO, LNCS 4447, pp. 90-101.
  • Deng, L., Peiz, Ma, J. J. and Lee, D. L. 2004. A Rank Sum Test Method for Informative Gene Discovery. In Proceedings of KDD'04, Seattle, Washington, USA.
  • Shannon, C. E. and Weaver, W. 1949 The Mathematical Theory of Communication. University of Illinois Press.
  • G. Balakrishnama, "Linear discriminant analysis - a brief tutorial," 1998.
  • Online]. Available: http://citeseer.ist.psu.edu/contextsummary/1048862/0
  • X. Wu et al., "Top 10 algorithms in data mining", Knowl Inf Syst, vol. 14, pp. 1-37, 2008.
  • Noble, W. S., "What is a support vector machine?", NATURE BIOTECHNOLOGY, vol. 24, pp. 1565-1567, 2006.
  • Lessmann, S., Stahlbock, R. and Crone, S. F. 2006. Genetic Algorithms for Support Vector Machine Model Selection. In Proceedings of the International Joint Conference on Neural Networks, Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada, pp. 3063-3069.
  • Saeedmanesh, M., Izadi, T. and Ahvar, E. 2010. HDM: A Hybrid Data Mining Technique for Stock Exchange Prediction. InProceedings International MultiConference of Engineers and Computr Scientists (IMECS).