CFP last date
20 May 2024
Reseach Article

On the Classification of Imbalanced Datasets

by Arun Kumar M.n, H. S. Sheshadri
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 44 - Number 8
Year of Publication: 2012
Authors: Arun Kumar M.n, H. S. Sheshadri
10.5120/6280-8449

Arun Kumar M.n, H. S. Sheshadri . On the Classification of Imbalanced Datasets. International Journal of Computer Applications. 44, 8 ( April 2012), 1-7. DOI=10.5120/6280-8449

@article{ 10.5120/6280-8449,
author = { Arun Kumar M.n, H. S. Sheshadri },
title = { On the Classification of Imbalanced Datasets },
journal = { International Journal of Computer Applications },
issue_date = { April 2012 },
volume = { 44 },
number = { 8 },
month = { April },
year = { 2012 },
issn = { 0975-8887 },
pages = { 1-7 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume44/number8/6280-8449/ },
doi = { 10.5120/6280-8449 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:34:58.661182+05:30
%A Arun Kumar M.n
%A H. S. Sheshadri
%T On the Classification of Imbalanced Datasets
%J International Journal of Computer Applications
%@ 0975-8887
%V 44
%N 8
%P 1-7
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In recent research the classifications of imbalanced data sets have received considerable attention. It is natural that due to the class imbalance the classifier tends to favour majority class. In this paper we investigate the performance of different methods for handling data imbalance in the microcalcification classification which is a classical example for data imbalance problem. Micro calcifications are very tiny deposits of calcium that appear as small bright spots in the mammogram. Classification of microcalcification clusters from mammograms plays an important role in computer-aided diagnosis for early detection of breast cancer. In this paper, we review in brief the state of the art techniques in the framework of imbalanced data sets and investigate the performance of different methods for microcalcification classification.

References
  1. Xue-wen Chen, Byron Gerlach, and David Casasent, " Pruning support vectors for imbalanced data classification", Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005.
  2. Juanjuan Wang, Mantao Xu, Hui Wang, Jiwu Zhang, "Classification of imbalanced data by using the SMOTE algorithm and locally linear embedding", ICSP2006 Proceedings.
  3. Chao Chen, Andy Liaw , and Leo Breiman, "Using random forest to learn imbalanced data", www. stat. berkeley. edu/tech-reports/666. pdf
  4. Xia Hong, "A kernel-based two-class classifier for imbalanced data sets", IEEE Transactions on Neural Networks, Vol. 18, NO. 1, January2007
  5. Yuchun Tang, Nitesh V. Chawla,, and Sven Krasser, "SVMs modeling for highly Imbalanced classification ", IEEE Transactions On Systems, MAN, and Cybernetics, Vol. 39, No. 1, FEBRUARY 2009
  6. David P. Williams, Vincent Myers, and Miranda Schatten Silvious, "Mine classification with imbalanced data ", IEEE Geoscience and Remote Sensing Letters, Vol. 6, No. 3, July 2009.
  7. Alberto Fernández, María José del Jesus, Francisco Herrera, "Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets ", International Journal of Approximate Reasoning 50 (2009) 561–577.
  8. Show-Jane Yen, Yue-Shi Lee, "Cluster-based under-sampling approaches for imbalanced data distributions ", Elsevier, Expert Systems with Applications 36 (2009) 5718–5727.
  9. Chao-Ton Su and Yu-Hsiang Hsiao, "An evaluation of the robustness of MTS for imbalanced data ", IEEE Transactions on knowledge and data engineering, Vol. 19, No. 10, October 2007.
  10. Mu-Chen Chen a, Long-Sheng Chen, Chun-Chin Hsu, Wei-Rong Zeng , "An information granulation based data mining approach for classifying imbalanced data ", Elsevier, Information Sciences 178 (2008) 3214–3227.
  11. Jian Huang, L´eon Bottou, C. Lee Giles, "Learning on the border: Active learning in imbalanced data classification ", CIKM'07, November 6–8, 2007, Lisboa, Portugal. ACM 978-1-59593-803-9/07/0011
  12. Son Lam Phung, Abdesselam Bouzerdoum, Giang Hoang Nguyen, "Learning pattern classification tasks with imbalanced data sets ", http://ro. uow. edu. au
  13. Piyasak Jeatrakul, Kok Wai Wong, and Chun Che Fung, "Classification of imbalanced data by combining the complementary neural network and SMOTE algorithm", http://researchrepository. murdoch. edu. au
  14. Yanmin Sun, Mohamed S Kamel, and Yang Wang, "Boosting for learning multiple classes with imbalanced class distribution ", Proceedings of the Sixth International Conference on Data Mining (ICDM'06), 0-7695-2701-9/06 © 2006 IEEE.
  15. Chao Chen and Mei-Ling Shyu, "Clustering-based binary-class classification for imbalanced data sets", rvc. eng. miami. edu/paper/2011/IRI11_clustering. pdf
  16. John Doucette and Malcolm I. Heywood, "GP Classification under imbalanced data sets: Active sub-sampling and AUC approximation",www. cs. dal. ca/~mheywood/Xfiles/publications/JohnEuroGP08. pdf
  17. Zhang and Indrajeet Mani, "kNN approach to unbalanced data distribution: A case study involving information Extraction", Workshop on Learning from Imbalanced Datasets II, ICML, Washington DC, 2003
  18. Hongyu Guo, Herna L Viktor, "Learning from imbalanced data sets with boosting and data generation: The DataBoost-IM approach", Sigkdd Explorations. Volume 6, Issue 1 - Page 30-39
  19. T. Warren Liao, "Classification of weld flaws with imbalanced class data ", Elsevier, Expert Systems with Applications 35 (2008) 1041–1052
  20. Vladimir Nikulin, Geoffrey J. McLachlan, and Shu Kay Ng, "Ensemble approach for the classification of imbalanced data ", www. maths. uq. edu. au/~gjm/nmn_lncs09. pdf
  21. Albert Orriols-Puig • Ester Bernadó-Mansilla, "Evolutionary rule-based systems for imbalanced data sets ", Soft Computing (2009) 13:213–225, Springer-Verlag 2008
  22. Jie Gu, "Random Forest Based Imbalanced Data Cleaning and Classification", http://lamda. nju. edu. cn/conf/pakdd07/dmc07/reports/P251. pdf
  23. Bhavani Raskutti, Kovalczyk, "Extreme re-balancing for SVMs: A case Study", Workshop on Learning from Imbalanced Datasets II, ICML, Washington DC, 2003.
  24. Gang Wu, Edward Y. Chang, "Class-boundary alignment for imbalanced dataset learning", Workshop on Learning from Imbalanced Datasets II, ICML, Washington DC, 2003
  25. Mikel Galar, Alberto Fern´andez, Edurne Barrenechea, Humberto Bustince, and Francisco Herrera, "A review on ensembles for the class imbalance problem: Bagging-, Boosting-, and Hybrid-Based Approaches ", IEEE Transactions on systems, MAN, and Cybernetics— 2011.
  26. Giang Hoang NguyenAbdesselam, "Learning pattern classification tasks with imbalanced data sets",
  27. Nitesh V. Chawla, Nathalie Japkowicz, Aleksander Ko lcz, "Editorial: Special issue on learning from imbalanced data sets ", Sigkdd Explorations. Volume 6, Issue 1 - Page 1-6, 2004
  28. Suzan K¨oknar-Tezel, Longin Jan Latecki, "Improving SVM classification on imbalanced data sets in distance spaces ", ICDM 2009.
  29. Haibo He, and Edwardo A. Garcia, "Learning from Imbalanced Data ", IEEE transactions on knowledge and data engineering, Vol. 21, No. 9, September 2009
  30. Seyda Ertekin, Jian Huang, C. Lee Giles, "Active learning for class imbalance problem ", www. personal. psu. edu/juh177/pubs/SIGIR2007_short. pdf
  31. Xing-Ming Zhao, Xin Li, Luonan Chen, and Kazuyuki Aihara1," Protein classification with imbalanced data", 2008; 70:1125–1132. VVC 2007
  32. J. Burez, D. Van den Poel, "Handling class imbalance in customer churn prediction ", Expert Systems with Applications 36 (2009) 4626–4636
  33. Baek Hwan Cho, Hwanjo Yu , Kwang-Won Kim , Tae Hyun Kim, In Young Kim, Sun I. Kim, "Application of irregular and unbalanced data to predict diabetic nephropathy using visualization and feature selection methods predict diabetic nephropathy using visualization and feature selection methods ", Elsevier, Artificial Intelligence in Medicine (2008) 42, 37—53
  34. Cheng G. Weng Josiah Poon, "A new evaluation measure for imbalanced datasets ", crpit. com/confpapers/CRPITV87Weng. pdf
  35. Sireesha Rodda, Prof. Shashi Mogalla, "A normalized measure for estimating classification rules for multi-class imbalanced data sets ", International Journal of Engineering Science and Technology (IJEST). 2011
  36. L. Bocchi, G. Coppini, J. Nori, G. Valli, "Detection of single and clustered microcalcifications in mammograms using fractals models and neural networks", Med. Eng. Phys. 26 (4) (2004) 303–312.
  37. Cancer Research UK: Key Facts on Breast Cancer, 2009. http://info. cancerresearchuk. org/cancerstats/types/breast/.
  38. M. De Santo, M. Molinara, F. Tortorella, M. Vento, "Automatic classification of clustered microcalcifications by a multiple expert system", Pattern Recogn. 36(7) (2003) 1467–1477.
  39. I. El-Naqa, Y. Yang, N. P. Galatsanos, et al. , "A similarity learning approach to content-based image retrieval: application to digital mammography", IEEE Trans. Med. Imaging 23 (10) (2004) 1233–1244.
  40. I. El-Naqa, Y. Yang, M. N. Wernick, et al. , "A support vector machine approach for detection of microcalcifications", IEEE Trans. Med. Imaging 21 (12) (2002) 1552–1563.
  41. J. Ge, B. Sahiner, L. M. Hadjiiski, H. -P. Chan, J. Wei, M. A. Helvie, C. Zhou, "Computer aided detection of clusters of microcalcifications on full field digital mammograms", Med. Phys. 33 (8) (2006) 2975–2988.
  42. L. Hadjiiski, B. Sahiner, H. -P. Chan, et al. , "Classification of malignant and benign masses based on hybrid ART2LDA approach", IEEE Trans. Med. Imaging 18 (12) (1999) 1178–1187.
  43. M. Kallergi, "Computer-aided diagnosis of mammographic microcalcification clusters", Med. Phys. 31 (2) (2004) 314–326.
  44. R. Nakayama, Y. Uchiyama, K. Yamamoto, et al. , "Computer- aided diagnosis scheme using a filter bank for detection of microcalcification clusters in mammograms", IEEE Trans. Biomed. Eng. 53 (2) (2006) 273–283.
  45. A. Papadopoulosab, D. I. Fotiadisb, A. Likasb, "Characterization of clustered microcalcifications in digitized mammograms using neural networks and support vector machines", Artif. Intell. Med. 34 (2) (2005) 141–150.
  46. P. Sajda, C. Spence, J. Pearson, "Learning contextual relationships in mammograms using a hierarchical pyramid neural network", IEEE Trans. Med. Imaging 21 (3) (2002) 239–250.
  47. H. Soltanian-Zadeha, F. Rafiee-Radc, S. Pourabdollah-Nejad, "Comparison of multiwavelet, wavelet, Haralick, and shape features for microcalcification classification in mammograms", Pattern Recognition. 37 (10) (2004) 1973–1986.
  48. J. Ren, D. Wang, J. Jiang, "Effective recognition of MCCs in mammograms using an improved neural classifier", Eng. Appl. Artif. Intell. 24 (4) (2011) 638–645.
  49. B. Verma, J. Zakos, "A computer-aided diagnosis system for digital mammograms based on fuzzy-neural and feature extraction techniques", IEEE Trans. Inform. Technol. Biomed. 5 (1) (2001) 46–54.
  50. L. Wei, Y. Wei, Y. Yang, R. M. Nishikawab, "Microcalcification classification assisted by content-based image retrieval for breast cancer diagnosis", Pattern Recognition. 42 (6) (2009) 1126–1132.
  51. L. Wei, Y. Yang, R. M. Nishikawa, et al. , "A study on several machine-learning methods for classification of malignant and benign clustered microcalcifications", IEEE Trans. Med. Imaging 24 (3) (2005) 371–380.
  52. D. Soria, J. M. Garibaldi, F. Ambrogi, et al. , "A non-parametric version of the Naive Bayes classifier", Knowledge Based System. 24 (6) (2011) 775–784.
  53. M. Salamo, M. Lopez-Sanchez, "Adaptive case-based reasoning using retention and forgetting strategies", Knowledge Based System. 24 (2) (2011) 230–247
  54. Sheshadri H. S, and Kandaswamy A, "Detection of breast cancer tumor based on morphological watershed algorithm", GVIP, 2005, pp. 17-21.
  55. Sheshadri H. S, and Kandaswamy A, "Experimental investigation on mammogram segmentation for early detection of breast cancer", Journal of Computerized Medical Imaging and Graphics, Elsevier science Vol. 31, 2005, 46-48
  56. Sheshadri H. S. and Kandaswamy A, "Mammogram image analysis using recursive watershed algorithm", National Journal of Technology, Vol. 1, No. 1, 2004, pp. 73-77.
  57. Sheshadri H. S, and Kandaswamy A, "Computer aided decision system for early detection of breast cancer", Indian Journal of Medical research, Vol. 124, 2006, pp. 149-154
  58. S. Oporto-D´?az, R. R. Hernandez-Cisneros and H. Terashima-Mar´?n, "Detection of microcalcification clusters in mammograms using a difference of optimized Gaussian filters", in Proceedings of the Second International Conference on Image Analysis and Recognition, ICIAR 2005, Toronto, ON, Canada, pp. 998–1005, 2005.
  59. Karssemeijer, N and Hendrikis, L. (1997). Computer assisted reading of mammograms Eur. Radiol. (7), 743-748
  60. Kim, J, K and Park H. W. (1999). Statistical textural features for detection of microcalcifications in digitized mammograms. IEEE Transactions on Medical Imaging (18), 231-238
  61. Mushlin, R and Shapiro, K, D. (1998). Estimating the Accuracy of screening mammography: A meta analysis. Journal of Preventive Medicine vol. 14 (2)143-153
  62. Arun kumar M. N and H. S. Sheshadri, "Breast contour extraction and pectoral muscle segmentation in digital mammograms", International Journal of Computer Science and Information Security, Vol 9, No. 2, February 2011.
  63. Rolando R. Hern´andez-Cisneros and Hugo Terashima, "Evolutionary Neural Networks Applied To The Classification Of Microcalcification Clusters In Digital Mammograms", 2006 IEEE Congress on Evolutionary Computation Sheraton Vancouver Wall Centre Hotel, Vancouver, BC, Canada July 16-21, 2006
Index Terms

Computer Science
Information Sciences

Keywords

Classification Microcalcification Imbalanced Data Sets Mammography