CFP last date
22 April 2024
Reseach Article

Article:Discovery of Knowledge Patterns in Clinical Data through Data Mining Algorithms: Multi-class Categorization of Breast Tissue Data

by Mrs.Shomona Gracia Jacob, Dr. R.Geetha Ramani
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 32 - Number 7
Year of Publication: 2011
Authors: Mrs.Shomona Gracia Jacob, Dr. R.Geetha Ramani
10.5120/3920-5521

Mrs.Shomona Gracia Jacob, Dr. R.Geetha Ramani . Article:Discovery of Knowledge Patterns in Clinical Data through Data Mining Algorithms: Multi-class Categorization of Breast Tissue Data. International Journal of Computer Applications. 32, 7 ( October 2011), 46-53. DOI=10.5120/3920-5521

@article{ 10.5120/3920-5521,
author = { Mrs.Shomona Gracia Jacob, Dr. R.Geetha Ramani },
title = { Article:Discovery of Knowledge Patterns in Clinical Data through Data Mining Algorithms: Multi-class Categorization of Breast Tissue Data },
journal = { International Journal of Computer Applications },
issue_date = { October 2011 },
volume = { 32 },
number = { 7 },
month = { October },
year = { 2011 },
issn = { 0975-8887 },
pages = { 46-53 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume32/number7/3920-5521/ },
doi = { 10.5120/3920-5521 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:18:36.369594+05:30
%A Mrs.Shomona Gracia Jacob
%A Dr. R.Geetha Ramani
%T Article:Discovery of Knowledge Patterns in Clinical Data through Data Mining Algorithms: Multi-class Categorization of Breast Tissue Data
%J International Journal of Computer Applications
%@ 0975-8887
%V 32
%N 7
%P 46-53
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper highlights the significance of classification in data mining and knowledge discovery. In this paper we investigate the performance of various data mining classification algorithms viz. Rnd Tree, Quinlan decision tree algorithm (C4.5), K-Nearest Neighbor algorithm etc., on a large dataset from the ‘Wisconsin Breast tissue dataset’ (derived from the UCI Machine Learning Repository) that comprises of 11 attributes and 106 instances. The results of this study indicate the level of accuracy and other performance measures of the algorithms in detecting the presence of breast cancer and the associated breast tissue conditions that increase the risk of developing cancer in future. Moreover the importance of feature selection/reduction in improving the performance of classification algorithms is also described. The classification algorithm Rnd Tree produced 100 percent accuracy for classification of all the training data under multiple classes. The classification algorithm was also applied to verify it’s correctness in classifying test data.

References
  1. J. Han and M. Kamber, ―Data Mining; Concepts and Techniques, Morgan Kaufmann Publishers, 2000.
  2. K. Cios, W. Pedrycz, and R. Swiniarski. Data Mining Methods for Knowledge Discovery. Boston: Kluwer Academic Publishers, 1998
  3. W. Ressom, Rency S. Varghese, Zhen Zhang, Jianhua Xuan, and Robert Clarke. 2008 Classification Algorithms for phenotype prediction in genomic and Proteomics Front BioScience.
  4. C. Y. V. Watanabe, M. X. Ribeiro, C. Traina, and A. J. M. Traina. 1997 SACMiner: A New Classification Method Based on Statistical Association Rules to Mine Medical Images," in Enterprise Information Systems, vol. 73.
  5. Breast Cancer Statistics from Centers for Disease Control and Prevention, http://www.cdc.gov/cancer/breast/statistics/
  6. S. Aruna, Dr S.P. Rajagopalan and L.V. Nandakishore, 2011 Knowledge Based Analysis Of Various Statistical Tools In Detecting Breast Cancer
  7. MedlinePlus:Breast Diseases
  8. Wennberg J, Cooper MM, editors. The Dartmouth atlas of medical care in the United States: a report on the Medicare program. Chicago, IL:AHA Press; 1999
  9. Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository
  10. http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science.
  11. Siegfried Nijssen and Joost N.Kok Multi-Class Correlated Pattern Mining.
  12. T. M. Cover, Member, IEEE, and P. E. Hart, Member, IEEE, “Nearest Neighbour Pattern Classification”, IEEE Transactions on Information Theory, 1967.
  13. Luxmi Verma, Dr.Varun Kumar, “Binary Classifiers for Health Care Databases: A ComparativeStudy of Data Mining Classification Algorithms in the Diagnosis of Breast Cancer”, IJCST, Vol 1, Issue 2, 2011.
  14. M. James. Classification Algorithms. John Wiley, 1985.
  15. Tanagra Data Mining tutorials, http://data-mining-tutorials.blogspot.com/ This website provides detailed information on the basics of Data Mining Algorithms
  16. K. Kira, L. Rendel, The feature selection problem: Traditional methods and a new algorithm, in: M. Press (Ed.), Proceedings of Tenth National Conference on Artificial Intelligence, 1992, pp. 129–134.
  17. I. Kononenko, Estimating attributes: Analysis and extensions of relief, in: Machine Learning:ECML-94, Vol. 784 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 1994,pp. 171–182..
  18. Yong Seog Kim, W. Nick Street, and Filippo Menczer, University of Iowa, USA, “Feature Selection in Data Mining”.
  19. Jean S. Whitaker, 1997. Use of Stepwise Methodology in Discriminant Analysis.
  20. D.Lavanya, Dr.K. Usha Rani, Performance Evaluation of Decision Tree Classifiers on Medical Data Sets”, International Journal of Computer Application, 2011
  21. C. Laredo, F. Austerlitz, O. David, B. Schaeffer, K. Bleakley,N. Vergne1, M. Veuille, “Error rates of phylogenetic and supervised classification algorithms in DNA Barcoding” Barcode Conference, Mexico, 7-12 Nov. 2009.
Index Terms

Computer Science
Information Sciences

Keywords

Knowledge Patterns Pattern Recognition Clinical Data Healthcare Breast Cancer Breast Tissue Classification