Call for Paper - March 2023 Edition
IJCA solicits original research papers for the March 2023 Edition. Last date of manuscript submission is February 20, 2023. Read More

An Automated Technique using Gaussian Naïve Bayes Classifier to Classify Breast Cancer

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2016
B. M. Gayathri, C. P. Sumathi

B M Gayathri and C P Sumathi. An Automated Technique using Gaussian Naïve Bayes Classifier to Classify Breast Cancer. International Journal of Computer Applications 148(6):16-21, August 2016. BibTeX

	author = {B. M. Gayathri and C. P. Sumathi},
	title = {An Automated Technique using Gaussian Naïve Bayes Classifier to Classify Breast Cancer},
	journal = {International Journal of Computer Applications},
	issue_date = {August 2016},
	volume = {148},
	number = {6},
	month = {Aug},
	year = {2016},
	issn = {0975-8887},
	pages = {16-21},
	numpages = {6},
	url = {},
	doi = {10.5120/ijca2016911146},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}


Objectives: The proposed work is to classify breast cancer with few attributes. Reducing the attributes reduces the time, so that the patient need not wait for result for a long time. For classification, the user friendly environment is created. The user can enter the details of the patient such as Clumpthickness, Uniformity in cell size etc., and the result is classified as benign or malignant. Statistical analysis: Variable selection is done by one of the variable reduction algorithm called Linear Discriminant Analysis (LDA). LDA is one of the statistical method. The dataset is passed to LDA function repeatedly and the combination of variables which gave the good accuracy is selected. The variables that are selected by using LDA are used in classifying breast cancer. Findings: This application is created to find whether the given record is benign or malignant tumor. In this proposed work, the dataset from UCI repository for breast cancer detection is used. There are many other works done for finding breast cancer risk, diagnosing breast cancer etc., and there may be at least ten variables used for classification which may be time consuming. But in this proposed work, only four are used and it gave the accuracy of up to 96%. Hence this may be the first step or idea for detecting breast cancer with lesser variables, so that this may be helpful for the doctors. Improvements: The proposed work is done based on the UCI machine learning repository dataset, which was uploaded by Wisconsin Hospitals, Madrid. Some changes can be made in the coding and this methodology can also be implemented in other dataset also by reducing the attributes.


  1. Saravanakumar K.and Arthanariee A. M. Evaluate the multiple breast cancer factors and calculate the risk by software tool breast cancer risk evaluator, Indian Journal of Science and Technology, 3( Apr 2015),686-91.
  2. Vaidhehi K and Subhashini T.S, Breast Tissue Characterization using combined K-NN classifier, Indian Journal of Science and Technology,8 ( Jan 2015),23-26.
  3. Aarthy S.L and Prabu.S, An Approach for Detecting Breast Cancer using Wavelet Transforms, Indian Journal of Science and Technology, 8 (Oct 2015),1-7
  4. Singh S, Dr.Gupta, P.R. and Sharma M.K, Breast Cancer Detection and Classification using Histopathological images, International Journal of Engineering Science and Technology, 3 ( May 2011),4-9.
  5. Dumitru D. 2009 Prediction of recurrent events in breast cancer using the Naïve Bayesian classification, Annals University of Craiova, Mathematics and Computer Science Series.
  6. Nahar J, Chen Y P P and Ali S, Kernel Based Naïve Bayes Classifier for Breast Cancer prediction, Journal of Biological Systems,15 (Oct 2007),17-25.
  7. Kharya S, Agarwal S and Soni S, Naive Bayes Classifiers: A Probabilistic Detection Model for Breast Cancer. International Journal of Computer Applications “, 92 (Apr 2014), 26-31.
  8. Andrews J L and McNicholas P D, Variable selection for clustering and classification, Journal of Classification, 31 (Jul 2014), 136-153.
  9. You H and Rumbe G, Comparative Study of Classification Techniques on Breast Cancer FNA Biopsy Data, International Journal of Interactive Multimedia and Artificial Intelligence, 1 (Dec 2010), 6-13.
  10. Güzel C., Mahmut Kaya M. and Yıldız O. 2013.Breast Cancer Diagnosis Based on Naïve Bayes Machine Learning Classifier with KNN Missing Data Imputation, 3rd World conference on innovation and Computer Sciences.
  11. Soria D, Garibaldi J M and Biganzoli E, 2008. A Comparison of Three Different Methods for Classification of Breast Cancer Data, Machine Learning and Applications, ICMLA '08. Seventh International Conference.
  12. Nezafat R, Tabesh A, Lucas C, Mohammed A and Zia M A. 1998. Feature Selection and Classification for Diagnosing Breast Cancer, “Proceedings of the IASTED International Conference of artificial intelligence and soft computing”.
  13. Kitbumrungrat K, Comparison Logistic Regression and Discriminant Analysis in classification groups for Breast Cancer, International Journal of Computer Science and Network Security, 12 ( May 2012), 111-15.
  14. Nanni L and Lumini A, Orthogonal linear discriminant analysis and feature selection for Micro-array data classification, Expert Systems with Applications, 37 (Oct 2010),7132-37.
  15. Nancy S G and Dr.Appavu alias Balamurugan S, A Comparative Study of Feature Selection Methods for cancer classification using Gene Expression Dataset, Journal of Computer Applications, 6 (Sep 2013), 78-84.
  16. Lichman,M.,‘UCI Machine learning Repository’, (Diagnostic),University of California, Irvine,CA,2013.
  17. Data preprocessing techniques for data mining-IASRI’,, Feb 2014.
  18. ‘Minitab-What is Mahalanobis distance’ Distance. Year-2015.
  19. ‘Naive Bayes classification algorithm’, http://software.ucv.rp/air/docs/naivebayes.pdf, Feb 2015.


Classification, Mahalanobis, Normalization, Fisher, data-preprocessing