CFP last date
20 May 2024
Reseach Article

A Framework for Medical Text Mining using a Novel Categorical Clustering Algorithm

by Anirban Chakrabarty
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 70 - Number 20
Year of Publication: 2013
Authors: Anirban Chakrabarty
10.5120/12184-8240

Anirban Chakrabarty . A Framework for Medical Text Mining using a Novel Categorical Clustering Algorithm. International Journal of Computer Applications. 70, 20 ( May 2013), 19-25. DOI=10.5120/12184-8240

@article{ 10.5120/12184-8240,
author = { Anirban Chakrabarty },
title = { A Framework for Medical Text Mining using a Novel Categorical Clustering Algorithm },
journal = { International Journal of Computer Applications },
issue_date = { May 2013 },
volume = { 70 },
number = { 20 },
month = { May },
year = { 2013 },
issn = { 0975-8887 },
pages = { 19-25 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume70/number20/12184-8240/ },
doi = { 10.5120/12184-8240 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:33:23.136231+05:30
%A Anirban Chakrabarty
%T A Framework for Medical Text Mining using a Novel Categorical Clustering Algorithm
%J International Journal of Computer Applications
%@ 0975-8887
%V 70
%N 20
%P 19-25
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The fast growth of medical records provides new opening for meaningful information retrieval in clinical diagnosis and treatment. Although nursing and pathology records provide a complete account of patient's information they are not fully utilized while taking major decisions of surgery or chemo therapy on patients. This research proposes a Minimum spanning tree algorithm to develop k-clusters of training data related to different liver diseases which are validated using Silhouette coefficient. A text classification algorithm is developed using cluster centers as training samples which uses a similarity measure to classify the categorical data. Simulation results show that the algorithm proposed can lower the calculation complexity and improve the accuracy of established text classification algorithms like k-NN. This research can serve as a medical diagnosis tool for classifying patient records and reveal important vocabularies that characterize nursing and pathology records.

References
  1. Jinshu, Su. , Zhang, Bofeng. , and Xin, Xu (2006). Advances in Machine Learning Based Text Categorization, Journal of Software, vol. 17, No. 9, pp1848-1859.
  2. Abraham, Ranjit. , Jay. B Simha. , and Iyengar, S. (2008) Effective Discretization and Hybrid Feature Selection using Naïve Bayesian Classifier for Medical Datamining", International Journal of Computational Intelligence Research, ISSN 0974-1259, vol. 4, no. 1, pp. 974-986.
  3. Wang,Y. , and Wang,X. J. (2005). A New Approach to feature selection in Text Classification, Proceedings of 4th International Conference on Machine Learning and Cybernetics, IEEE- 2005, vol. 6, pp. 3814-3819.
  4. Wang, Yi. , Bai,Shi. , and Wang, Zhang'ou(2007). A Fast KNN Algorithm applied to Web Text Categorization, Journal of The China Society for Scientific and Technical Information, vol. 26, No. 1, pp. 60-64.
  5. Ying, Li. , Zhang, Xiaohui. , Huayong, Wang and Chang Guiran(2004). Vector-Combination-Applied KNN Method for Chinese Text Categorization, Mini-Micro Systems, vol. 25, no. 6, pp. 993-996.
  6. Lee,L. W. , and Chen, S. M. ,(2006). New Methods for Text Categorization based on a new feature selection method and new Similarity measure between Documents, IEA, France.
  7. Chapman, W. W. , Dowling, J. N. , and Wagner, M. (2004) Fever detection from free-text clinical records for biosurveillance, Journal of Biomedical Informatics, volume 37, Issue 2, pp. 120-127.
  8. Mamlin, B. W. , Heinze,D. T. , and McDonald, C. J(2003) Automated Extraction and Normalization of Findings from Cancer-Related Free-Text Radiology Reports, Proceedings of the AMIA , pp 420-424.
  9. Mukherjea,S. , Bamba,B. , Kankar,P(2005) Information Retrieval and Knowledge Discovery Utilizing a BioMedical Patent Semantic Web", IEEE Transactions on Knowledge and Data Engineering, Volume 17, Issue 8, pp. 1099 – 1110.
  10. Pakhomov,S. , Hanson,P. L. , Bjornsen,S,Smith,S(2008) Automatic Classification of Foot Examination Findings Using Clinical Notes and Machine Learning", Journal of American Medical Informatics Association,15(2):198202. doi:10. 1197/jamia. M2585.
  11. Peter,John,S. ,(2010) Minimum Spanning Tree-based Structural Similarity Clustering for Image Mining with Local Region Outliers, International Journal of Computer Applications (0975 – 8887) Volume 8, no. 6.
  12. Sebastiani, Fabrizio(2002). Machine learning in text categorization, ACM Computer Survey, vol. 34, no. 1pp. 1- 47.
  13. Ranjani,R. , Anitha,S. , Elavarasi and Akilandeswari,J. (2012) Categorical Data Clustering using Cosine based similarity for Enhancing the Accuracy of Squeezer Algorithm, International Journal of Computer Applications, 45(20), pp. 41-45, Published by Foundation of Computer Science, New York.
  14. Asano,T. , Bhattacharya,B. , Keil,M. , and Yao,F(1988) Clustering algorithms based on minimum and maximum spanning trees. In Proceedings of the 4th Annual. Symposium on Computational Geometry, pp. 252–257.
  15. Satu Elisa Schaeffer (2007). Survey on Graph Clustering, Elsevier Computer Science Review,pp. 27-64, doi:10. 1016/j. cosrev. 2007. 05. 001.
  16. Aranganayagi,S, Thangavel, K(2007). Clustering Categorical Data using Silhouette coefficient as a relocating measure, In Proceedings of International Conference on Computational Intelligence and Multimedia applications,0-7695-3050-8/07, DOI 10. 1109/ICCIMA. 2007. 328
  17. www. hepatitis. va. gov/provider/cases/index. asp. The site provides patient cases which were presented at the Veteran Affairs Advanced Liver Disease Resource Training Programs and illustrate a number of areas of liver disease that are actively being researched.
  18. www. livestrong. com/article/41489-liver-surgery,Complications. The site livestrong. com highlights the complications that arise from liver surgeries like bile duct problems, rejection on transplant, infections and thus emphasizes that patients for liver surgeries should be carefully chosen.
Index Terms

Computer Science
Information Sciences

Keywords

Categorical clustering spanning tree weight factor silhouette coefficient liver disease