Discovery of Knowledge Patterns in Clinical Data through Data Mining Algorithms: Multi-class Categorization of Breast Tissue Data
![]() |
10.5120/3920-5521 |
Mrs.Shomona Gracia Jacob and Dr. R.Geetha Ramani. Article:Discovery of Knowledge Patterns in Clinical Data through Data Mining Algorithms: Multi-class Categorization of Breast Tissue Data. International Journal of Computer Applications 32(7):46-53, October 2011. Full text available. BibTeX
@article{key:article, author = {Mrs.Shomona Gracia Jacob and Dr. R.Geetha Ramani}, title = {Article:Discovery of Knowledge Patterns in Clinical Data through Data Mining Algorithms: Multi-class Categorization of Breast Tissue Data}, journal = {International Journal of Computer Applications}, year = {2011}, volume = {32}, number = {7}, pages = {46-53}, month = {October}, note = {Full text available} }
Abstract
This paper highlights the significance of classification in data mining and knowledge discovery. In this paper we investigate the performance of various data mining classification algorithms viz. Rnd Tree, Quinlan decision tree algorithm (C4.5), K-Nearest Neighbor algorithm etc., on a large dataset from the ‘Wisconsin Breast tissue dataset’ (derived from the UCI Machine Learning Repository) that comprises of 11 attributes and 106 instances. The results of this study indicate the level of accuracy and other performance measures of the algorithms in detecting the presence of breast cancer and the associated breast tissue conditions that increase the risk of developing cancer in future. Moreover the importance of feature selection/reduction in improving the performance of classification algorithms is also described. The classification algorithm Rnd Tree produced 100 percent accuracy for classification of all the training data under multiple classes. The classification algorithm was also applied to verify it’s correctness in classifying test data.
Reference
- J. Han and M. Kamber, ―Data Mining; Concepts and Techniques, Morgan Kaufmann Publishers, 2000.
- K. Cios, W. Pedrycz, and R. Swiniarski. Data Mining Methods for Knowledge Discovery. Boston: Kluwer Academic Publishers, 1998
- W. Ressom, Rency S. Varghese, Zhen Zhang, Jianhua Xuan, and Robert Clarke. 2008 Classification Algorithms for phenotype prediction in genomic and Proteomics Front BioScience.
- C. Y. V. Watanabe, M. X. Ribeiro, C. Traina, and A. J. M. Traina. 1997 SACMiner: A New Classification Method Based on Statistical Association Rules to Mine Medical Images," in Enterprise Information Systems, vol. 73.
- Breast Cancer Statistics from Centers for Disease Control and Prevention, http://www.cdc.gov/cancer/breast/statistics/
- S. Aruna, Dr S.P. Rajagopalan and L.V. Nandakishore, 2011 Knowledge Based Analysis Of Various Statistical Tools In Detecting Breast Cancer
- MedlinePlus:Breast Diseases
- Wennberg J, Cooper MM, editors. The Dartmouth atlas of medical care in the United States: a report on the Medicare program. Chicago, IL:AHA Press; 1999
- Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository
- http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science.
- Siegfried Nijssen and Joost N.Kok Multi-Class Correlated Pattern Mining.
- T. M. Cover, Member, IEEE, and P. E. Hart, Member, IEEE, “Nearest Neighbour Pattern Classification”, IEEE Transactions on Information Theory, 1967.
- Luxmi Verma, Dr.Varun Kumar, “Binary Classifiers for Health Care Databases: A ComparativeStudy of Data Mining Classification Algorithms in the Diagnosis of Breast Cancer”, IJCST, Vol 1, Issue 2, 2011.
- M. James. Classification Algorithms. John Wiley, 1985.
- Tanagra Data Mining tutorials, http://data-mining-tutorials.blogspot.com/ This website provides detailed information on the basics of Data Mining Algorithms
- K. Kira, L. Rendel, The feature selection problem: Traditional methods and a new algorithm, in: M. Press (Ed.), Proceedings of Tenth National Conference on Artificial Intelligence, 1992, pp. 129–134.
- I. Kononenko, Estimating attributes: Analysis and extensions of relief, in: Machine Learning:ECML-94, Vol. 784 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 1994,pp. 171–182..
- Yong Seog Kim, W. Nick Street, and Filippo Menczer, University of Iowa, USA, “Feature Selection in Data Mining”.
- Jean S. Whitaker, 1997. Use of Stepwise Methodology in Discriminant Analysis.
- D.Lavanya, Dr.K. Usha Rani, Performance Evaluation of Decision Tree Classifiers on Medical Data Sets”, International Journal of Computer Application, 2011
- C. Laredo, F. Austerlitz, O. David, B. Schaeffer, K. Bleakley,N. Vergne1, M. Veuille, “Error rates of phylogenetic and supervised classification algorithms in DNA Barcoding” Barcode Conference, Mexico, 7-12 Nov. 2009.