CFP last date
22 April 2024
Reseach Article

Diagnosis of Breast Cancer using Decision Tree Models and SVM

by Alaa. M. Elsayad, H. A. Elsalamony
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 83 - Number 5
Year of Publication: 2013
Authors: Alaa. M. Elsayad, H. A. Elsalamony
10.5120/14445-2604

Alaa. M. Elsayad, H. A. Elsalamony . Diagnosis of Breast Cancer using Decision Tree Models and SVM. International Journal of Computer Applications. 83, 5 ( December 2013), 19-29. DOI=10.5120/14445-2604

@article{ 10.5120/14445-2604,
author = { Alaa. M. Elsayad, H. A. Elsalamony },
title = { Diagnosis of Breast Cancer using Decision Tree Models and SVM },
journal = { International Journal of Computer Applications },
issue_date = { December 2013 },
volume = { 83 },
number = { 5 },
month = { December },
year = { 2013 },
issn = { 0975-8887 },
pages = { 19-29 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume83/number5/14445-2604/ },
doi = { 10.5120/14445-2604 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:58:35.394220+05:30
%A Alaa. M. Elsayad
%A H. A. Elsalamony
%T Diagnosis of Breast Cancer using Decision Tree Models and SVM
%J International Journal of Computer Applications
%@ 0975-8887
%V 83
%N 5
%P 19-29
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Breast cancer represents the second important cause of cancer deaths in women today and it is the most common type of cancer in women. Disease diagnosis is one of the applications where data mining tools are proving successful results. Data mining with decision trees is popular and effective data mining classification approach. Decision trees have the ability to generate understandable classification rules, which are very efficient tool for transfer knowledge to physicians and medical specialists. In fundamental truth, they provide trails to find rules that could be evaluated for separating the input samples into one of several groups without having to state the functional relationship directly. The objective of this paper is to examine the performance of recent invented decision tree modeling algorithms and compared with one that achieved by radial basis function kernel support vector machine (RBF-SVM) on the diagnosis of breast cancer using cytological proven tumor dataset. Four models have been evaluated in decision tree: Chi-squared Automatic Interaction Detection (CHAID), Classification and Regression tree (C&R), Quick Unbiased Efficient Statistical Tree (QUEST), and Ross Quinlan new decision tree model C5. 0. The objective is to classify a tumor as either benign or malignant based on cell descriptions compound by microscopic examination using decision tree models. The proposed algorithm imputes the missing values with C&R tree. Then, the performances of the five models are measured by three statistical measures; classification accuracy, sensitivity, and specificity.

References
  1. Biggs, D. , B. De Ville, and E. Suen. A method of choosing multi-way partitions for classification and decision trees. Journal of Applied Statistics, 18 (1), 49-62, 1991.
  2. Breiman, L. , Friedman, J. H. , Olshen, R. A. , and Stone, C. J. Classification and Regression Trees, Belmont, California: Wadsworth, Inc, 1998.
  3. Buzdar. A. U. and R. S. Freedman. Breast Cancer. The 2nd edition, Springer Science and Business Media, 2008.
  4. C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(2), 273–297, 1995.
  5. Calle. J. Breast cancer facts and figures 2003–2004. American Cancer Society 2004. http://www. cancer. org/ (last accessed: Jan. 2010).
  6. Duda and D. G. Stock pattern classification. John Wiley & Sons New York, 2001.
  7. F. Friedrichs and C. Igel. Trends in Neurocomputing. The 12th European Symposium on Artificial Neural Networks 64:107–117, 2005.
  8. Floares. A. , A. Birlutiu. "Decision Tree Models for Developing Molecular Classifiers for Cancer Diagnosis". WCCI 2012 IEEE World Congress on Computational Intelligence June, 10-15, 2012- Brisbane, Australia.
  9. H. Frohlich and A. Zell. Efficient parameter selection for support vector machines. IEEE International Joint Conference on Neural Networks, 3:1431–1436¸2005.
  10. Han. J. W. and M. Kamber. Data mining concepts and techniques, The 2nd edition, Morgan Kaufmann Publishers, San Francisco, CA, 2006.
  11. Hornik. K. , Stinchcombe and H. White. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward network. Neural Networks, 3, 359-66, 1990.
  12. Hany A. Elsalamony. ,Alaa M. Elsayad. Bank Direct Marketing Based on Neural Network and C5. 0 Models. International Journal of Engineering and Advanced Technology (IJEAT), ISSN: 2249 – 8958, Volume-2, Issue-6, August 2013.
  13. http://www. archive. ics. uci. edu/ml/index. html (last accessed: November 2012).
  14. http://www. komen. org/bci/bhealth/QA/q/and/a. asp (last accessed: Jan. 2010).
  15. http://www. rulequest. com/see5/info. html (last accessed: November 2012).
  16. Karabatak. M. and M. Cevdet. An expert system for detection of breast cancer based on association rules and neural network. Expert Systems with Applications 36: 3465–3469, 2009.
  17. Kass, G. V. An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29 (2), 119-127, 1980.
  18. Kovalerchuc. B. , E. Triantaphyllou, J. F. Ruiz and J. Clayton. Fuzzy logic in computer-aided breast-cancer diagnosis: Analysis of lobulation. Artificial Intelligence in Medicine, 11: 75–85, 1997.
  19. Lavanya. D. , K. Usha Rani. "Ensemble Decision Making System for Breast Cancer Data". International Journal of Computer Applications (0975 – 8887) Volume 51– No. 17, August 2012.
  20. Lim, T. -S. , Loh, W. -Y. , and Shih, Y. -S. A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms, Machine Learning Journal, vol. 40, 203-228, 2000.
  21. Loh, W. -Y. and Shih, Y. -S. "Split selection methods for classification trees", Statistica Sinica, vol. 7, 815-840, 1997.
  22. N. Cristianini and J. S. Taylor. An introduction to support vector machines and other Kernel-based learning methods. Cambridge University Press, London, 2000.
  23. Nisbet. R. , J. Elder and G. Miner. Handbook of statistical analysis and data mining applications. Academic Press, Burlington, MA, 2009.
  24. Pendharkar. P. C. , J. A. Rodger, G. J. Yaverbaum, N. Herman and M. Benner. Association's statistical, mathematical and neural approaches for mining breast cancer patterns. Expert Systems with Applications, 17:223–232, 1999.
  25. Ripley, B. D. Pattern recognition and neural networks. Cambridge University Press, Cambridge, UK, 1996.
  26. Su-lin PANG, Ji-zhang GONG, C5. 0 Classification Algorithm and Application on Individual Credit Evaluation of Banks, Systems Engineering - Theory & Practice, Volume 29, Issue 12, Pages 94–104, December 2009.
  27. Ture. M. , F. Tokatli and I. Kurt. Using Kaplan–Meier analysis together with decision tree methods (C&RT, CHAID, QUEST, C4. 5 and ID3) in determining recurrence-free survival of breast cancer patients. Expert Systems with Applications, 36, 2017–2026, 2009.
  28. Vapnik, V. N. Statistical Learning Theory. John Wiley & Sons, New York, 1998.
Index Terms

Computer Science
Information Sciences

Keywords

Breast cancer classification decision tree algorithms SVM missing data imputation