CFP last date
20 May 2024
Call for Paper
June Edition
IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper
Know more
Reseach Article

Breast Cancer Microarray Dataset with the Decision Tree Classifier and Efficient Scaling Techniques

by Maha A. Hana, Elsayed Badr, Sally Gamal, Naglaa Shehata
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 183 - Number 4
Year of Publication: 2021
Authors: Maha A. Hana, Elsayed Badr, Sally Gamal, Naglaa Shehata
10.5120/ijca2021921324

Maha A. Hana, Elsayed Badr, Sally Gamal, Naglaa Shehata . Breast Cancer Microarray Dataset with the Decision Tree Classifier and Efficient Scaling Techniques. International Journal of Computer Applications. 183, 4 ( May 2021), 13-17. DOI=10.5120/ijca2021921324

@article{ 10.5120/ijca2021921324,
author = { Maha A. Hana, Elsayed Badr, Sally Gamal, Naglaa Shehata },
title = { Breast Cancer Microarray Dataset with the Decision Tree Classifier and Efficient Scaling Techniques },
journal = { International Journal of Computer Applications },
issue_date = { May 2021 },
volume = { 183 },
number = { 4 },
month = { May },
year = { 2021 },
issn = { 0975-8887 },
pages = { 13-17 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume183/number4/31915-2021921324/ },
doi = { 10.5120/ijca2021921324 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:15:51.317489+05:30
%A Maha A. Hana
%A Elsayed Badr
%A Sally Gamal
%A Naglaa Shehata
%T Breast Cancer Microarray Dataset with the Decision Tree Classifier and Efficient Scaling Techniques
%J International Journal of Computer Applications
%@ 0975-8887
%V 183
%N 4
%P 13-17
%D 2021
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Badr et al. [1] proposed efficient scaling techniques EST with support vector machine on the data set Wisconsin from UCI machine learning with a total 569 rows and 33 columns. In this work, we try to evaluate the validity of the results reached by Badr et al. [1] in the case of using different datasets, different classifiers and dimensionality reduction tools? So, the decision tree algorithm is applied on the used breast cancer microarray dataset (BCMD) contains 289 patients and 35981 attributes. We use principal components analysis (PCA) to reduce the number of attributes. We also propose new scaling techniques to improve the accuracy of the decision tree algorithm. Experimental results show that the decision tree algorithm with new scaling techniques (equilibration, geometric mean and arithmetic mean) achieves 84.98 %, 80.65 % and 79.96 % accuracy against to the traditional normalization (normalization [0, 1], normalization [-1, 1] and standard normalization) by 75.44 %, 76.85% and 78.93%.

References
  1. Elsayed Badr, Mustafa Abdul Salam, Sultan Almotairi, Hagar Ahmed, "From Linear Programming Approach to Metaheuristic Approach: Scaling Techniques", Complexity, vol. 2021, Article ID 9384318, 10 pages, 2021. https://doi.org/10.1155/2021/9384318
  2. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A, “Global Cancer Statistics 2018,” GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin, in press.
  3. Tomlin, J. A. 1975. On scaling linear programming problems. Mathematical Programming Studies 4, 146-166. DOI= http://dx.doi.org/10.1007/BFb0120718.
  4. Curtis, A. R. and Reid, J. K. 1972. On the automatic scaling of matrices for Gaussian elimination. IMA Journal of Applied Mathematics 10, 1, 118-124. DOI= http://dx.doi.org/10.1093/imamat/10.1.118
  5. Fulkerson, D. R. and Wolfe, P. 1962. An algorithm for scaling matrices. SIAM Review 4, 2, 142-146. DOI= http://dx.doi.org/10.1137/1004032.
  6. http://sbcb.inf.ufrgs.br/cumida (accessed May 01, 2021).
  7. J. Han and M. Kamber, "Data Mining Concepts and Techniques”, Morgan Kauffman Publishers, 2000.
  8. Larsson, T. 1993. On scaling linear programs-Some experimental results. Optimization 27, 4, 335-373. DOI= http://dx.doi.org/10.1080/02331939308843895
  9. de Buchet, J. 1966. Experiments and statistical data on the solving of large-scale linear programs. In Proceedings of the Fourth International Conference on Operational Research, Hertz, D. A. and Melese, J., Eds. Wiley-Interscience, New York, 3-13.
  10. Elble, J. M. and Sahinidis, N. V. 2012. Scaling linear optimization problems prior to application of the simplex method. Computational Optimization and Applications 52, 2, 345-371. DOI= http://dx.doi.org/10.1007/s10589-011-9420-4
  11. Benichou, M., Gauthier, J. M., Hentges, G., and Ribiere, G. 1977. The efficient solution of large-scale linear programming problems-Some algorithmic techniques and computational results. Mathematical Programming 13, 1, 280-322. DOI= http://dx.doi.org/10.1007/BF01584344
  12. Ploskas, N. and Samaras N. 2013. A Computational Comparison of Scaling Techniques for Linear Optimization Problems on a GPU. Optimization Methods and Software. Paper under review.
  13. Triantafyllidis, C. and Samaras, N. “Three nearly scaling-invariant versions of an exterior point algorithm for linear programming”, Optimization. 2014, 64(10), 2163–2181.
  14. Ploskas, N. and Samaras, N. “A computational comparison of scaling techniques for linear optimization problems on a graphical processing unit”, International Journal of Computer Mathematics. 2015, 92(2), 319–336.
  15. E. M. Badr and H. elgendy (2020) "A Hybrid water cycle - particle swarm optimization for solving the fuzzy underground water confined steady flow" Indonesian Journal of Electrical Engineering and Computer Science Vol 19, No1: 2020
  16. Elsayed M. Badr, Mahmoud I. Moussa in Wireless Networks (2019), An upper bound of radio k-coloring problem and its integer linear programming model, First Online: 18 March 2019.
  17. Badr, E.;Aloufi,K.A Robot's Response Acceleration Using the Metric Dimension Problem. Preprints 2019, 2019110194 (doi:10.20944/preprints201911.0194.v1).
  18. E.S. Badr, K. Paparrizos, Baloukas Thanasis and G. Varkas (2006), Some computational results on the efficiency of an exterior point algorithm, in Proc. of the 18th National Conference of Hellenic Operational Research Society (HELORS), 15-17 June, Rio, Greece, pp. 1103-1115
  19. E. S. Badr, K. Paparrizos, N. Samaras, and A. Sifaleras (2005), On the Basis Inverse of the Exterior Point Simplex Algorithm, in Proc. of the 17th National Conference of Hellenic Operational Research Society (HELORS), 16-18 June, Rio, Greece, pp. 677-687.
  20. E.S. Badr, M. Moussa, K. Paparrizos, N. Samaras, and A. Sifaleras, Some computational results on MPI parallel implementation of dense simplex method, World Academy of Science, Engineering and Technology (WASET), 23, 2008,778–781.
  21. E. M. Badr and Sultan Almotiari (2019) " On a Dual Direct Cosine Simplex Type Algorithm and Its Computational Behavior" Mathematical Problems in Engineering Volume 2020, Article ID 7361092, 8 pages. https://doi.org/10.1155/2020/7361092
  22. Chin-Wei Hsu, Chih-Chung Chang and Chih-Jen Lin (2010). A practical guide to support vector classification. Technical Report, National Taiwan University.
  23. Chicco D (December 2017). "Ten quick tips for machine learning in computational biology". BioData Mining. 10 (35): 35. doi:10.1186/s13040-017-0155-3. PMC 5721660. PMID 29234465.
  24. Vapnik, V.N. “The nature of statistical learning theory”, Springer: New York, 1995.
  25. Chang, C.C. and C.J. Lin, LIBSVM: a library for support vector machines. 2001, Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
  26. Salzberg, S. L., On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Min Knowl Discov 1(3):317–328, 1997.
  27. EM Badr, MA Salam, M Ali, H Ahmed, Social Media Sentiment Analysis using Machine Learning and Optimization Techniques, International Journal of Computer Applications (0975 – 8887) Volume 178 – No. 41, August 2019.
  28. Elsayed Badr, Mustafa Abdulsalam and Hagar Ahmed. "The impact of scaling on Support Vector Machine in Breast Cancer Diagnosis". International Journal of Computer Applications 175(19):15-19, September 2020
Index Terms

Computer Science
Information Sciences

Keywords

Machine Learning Breast Cancer Decision Tree scaling techniques