Combination of K-Nearest Neighbor and K-Means based on Term Re-weighting for Classify Indonesian News

Putu Wira Buana; Sesaltina Jannet D.r.m.; I Ketut Gede Darma Putra

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

Navigating the Future of Cybersecurity: A Strategic Approach to Crypto Agility for Modern Enterprises

Aditya Gupta

Random Articles

Passenger Travel behavior Model in Railway Network Simulation

Apr

2017

Review of Application of Internet of Things in Agriculture in India

Aug

2018

Web Application Top 10 OWASP Attacks and Defence Mechanism

Aug

2023

An Incorporated Voting Strategy on Majority and Score- based Fuzzy Voting Algorithms for Safety-Critical Systems

July

2014

Reseach Article

Combination of K-Nearest Neighbor and K-Means based on Term Re-weighting for Classify Indonesian News

by Putu Wira Buana, Sesaltina Jannet D.r.m., I Ketut Gede Darma Putra

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 50 - Number 11

Year of Publication: 2012

Authors: Putu Wira Buana, Sesaltina Jannet D.r.m., I Ketut Gede Darma Putra

10.5120/7817-1105

Putu Wira Buana, Sesaltina Jannet D.r.m., I Ketut Gede Darma Putra . Combination of K-Nearest Neighbor and K-Means based on Term Re-weighting for Classify Indonesian News. International Journal of Computer Applications. 50, 11 ( July 2012), 37-42. DOI=10.5120/7817-1105

@article{ 10.5120/7817-1105,

author = { Putu Wira Buana, Sesaltina Jannet D.r.m., I Ketut Gede Darma Putra },

title = { Combination of K-Nearest Neighbor and K-Means based on Term Re-weighting for Classify Indonesian News },

journal = { International Journal of Computer Applications },

issue_date = { July 2012 },

volume = { 50 },

number = { 11 },

month = { July },

year = { 2012 },

issn = { 0975-8887 },

pages = { 37-42 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume50/number11/7817-1105/ },

doi = { 10.5120/7817-1105 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:48:03.277032+05:30

%A Putu Wira Buana

%A Sesaltina Jannet D.r.m.

%A I Ketut Gede Darma Putra

%T Combination of K-Nearest Neighbor and K-Means based on Term Re-weighting for Classify Indonesian News

%J International Journal of Computer Applications

%@ 0975-8887

%V 50

%N 11

%P 37-42

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

KNN is one of the accepted classification tool, it used all training samples in the classification which cause to a high level of computation complexity. To resolve this problem, it is necessary to combine traditional KNN algorithm and K-Means cluster algorithm that is proposed in this paper. After completing the preprocessing step, the first thing to do is weighting the word (term) by usingTerm Frequency-Inverse Document Frequency (TF-IDF). TF-IDF weightedthe words calculating the number of words that appear in a document. Second, grouping all the training samples of each category of K-means algorithm, and take all the cluster centers as the new training sample. Third, the modified training samples are used for classification with KNN algorithm. Finally, calculate the accuracy of the evaluation using precision, recall and f-measure. The simulation results show that the combination of the proposed algorithm in this study has a percentage accuracy reached 87%, an average value of f-measure evaluation= 0. 8029 with the best k-values= 5 and the computation takes 55 second for one document.

References

Feldman, Ronen and Sanger, James. 2007. The Text Mining Handbook Advanced Approaches in Analyzing Unstructured Data. New York: Cambridge University Press.
Hearst, Marti. 2003. What is text mining?. SIMS, UC Berkeley. http://www. sims. berkeley. edu/~hearst/text-mining. html
Srivastava, Ashok N. and Sahami, Mehran. 2009. Text Mining Classification, Clustering, and Application. New York: CRC Press
Herwansyah,Adhit. 2009. AplikasiPengkategorianDokumendanPengukuran Tingkat SimilaritasDokumenMenggunakan Kata KuncipadaDokumenPenulisanIlmiahUniversitasGunadarma. http://www. gunadarma. ac. id/library/articles/graduate/computer-science/2009/Artikel_10105046. pdf
E. Fix and J. Hodges Discriminatory analysis. Nonparametric discrimination: Consistency properties. Technical Report 4, USAF School of Aviation Medicine Randolph Field, Texas, 1951.
Xindong Wu and Vipin Kumar. The Top Ten Algorithms in Data Mining. Chapman & Hall/CRC. New York: CRC Press
W. Yu, and W. Zhengguo, A Fast kNN algorithm for text categorization, Proceedings of the Sixth International Conference on Machine Learning and Cybernetics, Hong Kong, pp. 3436-3441, 2007.
Yang Y, Pedersen J O. A comparative study on feature selection in text categorization, ICNL,1997, pp. 412-420
Zhou Yong, LiYouwen and Xia Shixiong. 2009. An Improved KNN Text Classification Algorithm Based on Clustering. Journal of Computers, vol. 4,no. 3
N. Suguna and Dr. K. Thanushkodi. 2010. An Improved k-Nearest Neighbor Classification Using Genetic Algorithm. International Journal of Computer Science Issues, vol. 7,Issue 4,No. 2
Elisabeth, Hendrice. 2009. News Text Classification by Weight Adjusted K-Nearest Neighbor (WAKNN). InstitutTeknologi Telkom, Bandung,Indonesia.
Garcia, Dr. E. 2005. The Classic Vector Space Model (Description, Advanteges and Limitations of the Classic Vector Space Model).
Baldi, P, P. Frasconi, dan P. Smyth. 2003. ModellingThe Internet and The Web: Probabilistic Methods and Algorithms. New York: John and Willey & Sons.
Keno Buss. Literature Review on Preprocessing for Text Mining. STRL, De Montfort University.
Ramos, Juan. 2006. Using TF-IDF to Determine Word Relevance in Document Queries. Department of Computer Science, Rutgers University. http://www. cs. rutgers. edu/~mlittman/courses/m103/iCML03/papers/ramos. pdf
Atila Elci. 2011. Text Classification by PNN Term Re-Weighting. Turkey. International Journal of Computer Application Vol 29-No. 12, September 2011
Teknomo, Kardi. K-Nearest Neighbors Tutorial. http://people. revoledu. com/kardi/tutorial/KNN/index. html
Yang Lihua, Dai Qi, GuoYanjun, Study on KNN Text Categorization Algorithm, Micro Computer Information, No. 21, 2006, pp. 269-271
Xu, RuidanWunsch, D. C. 2009. Clustering. New York: John Wiley & Sons
Khaled W. Alnaji and Wesam M. Ashour. 2011. A Novel Clustering Algorithm using K-means (CUK). The Islamic University of Gaza. International Journal of Computer Applications Vol 25 No. 1 July 2011
Xinhao Wang, DingshengLuo, Xihong Wu, Huisheng Chi, Improving Chinese Text Categorization by Outlier Learning, Proceeding of NLP-KE'05, pp. 602-607
Lewis, D. 1995. Evaluating and Optimizing Autonomous Text Classification Systems. AT&T Bell Laboratories Murray Hill, NJ 07974. USA. Proceedings of the Eighteenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, July, 1995, pp. 246-254 http://net. pku. edu. cn/~wbia/2005/public_html/papers/classification/
Tala, Fadillah Z, 2003. A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia. Master of Logic Project. Institute for Logic, Language and Computation. Unversiteitvan Amsterdam. The Netherlands. www. illc. uva. nl/Publications/ResearchReports/MoL-200302. text. pdf
http://datamin. ubbcluj. ro/wiki/index. php/Evaluation_methods_in_text_categorization

Index Terms

Computer Science

Information Sciences

Keywords

Text Classification KNN classification Algorithm K-means Cluster Algorithm TF-IDF Method