Determining Term on Text Document Clustering using Algorithm of Enhanced Confix Stripping Stemming

Titin Winarti; Jati Kerami; Sunny Arief

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Determining Term on Text Document Clustering using Algorithm of Enhanced Confix Stripping Stemming

by Titin Winarti, Jati Kerami, Sunny Arief

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 157 - Number 9

Year of Publication: 2017

Authors: Titin Winarti, Jati Kerami, Sunny Arief

10.5120/ijca2017912761

Titin Winarti, Jati Kerami, Sunny Arief . Determining Term on Text Document Clustering using Algorithm of Enhanced Confix Stripping Stemming. International Journal of Computer Applications. 157, 9 ( Jan 2017), 8-13. DOI=10.5120/ijca2017912761

@article{ 10.5120/ijca2017912761,

author = { Titin Winarti, Jati Kerami, Sunny Arief },

title = { Determining Term on Text Document Clustering using Algorithm of Enhanced Confix Stripping Stemming },

journal = { International Journal of Computer Applications },

issue_date = { Jan 2017 },

volume = { 157 },

number = { 9 },

month = { Jan },

year = { 2017 },

issn = { 0975-8887 },

pages = { 8-13 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume157/number9/26858-2017912761/ },

doi = { 10.5120/ijca2017912761 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:03:27.522705+05:30

%A Titin Winarti

%A Jati Kerami

%A Sunny Arief

%T Determining Term on Text Document Clustering using Algorithm of Enhanced Confix Stripping Stemming

%J International Journal of Computer Applications

%@ 0975-8887

%V 157

%N 9

%P 8-13

%D 2017

%I Foundation of Computer Science (FCS), NY, USA

Abstract

In a term based clustering technique with the vector space model, the issue of high dimensional vector space due to the number of words used always appears. This causes the clustering performance drops because the distance among the points tends to have the same value. The reduction of dimension by decreasing the number of words can be done by stemming. Stemming was used as term selection to reduce the many terms generated on preprocessing. The utilization of algorithm of enhance confix stripping stemmer reduced the terms that must be processed of 199.358 terms resulted from 108 text documents, became 5.476 terms result of the stemming. This reduction would speed up the process and saved the storage media. The evaluation by utilizing clustering was done using confusion matrix. The accuracy of experiment increased.

References

Sharma, D.: Improved stemming approach used for text processing in information retrieval system. Master of Engineering in Computer Science & Engineering, Thapar University, Patiala (2012)
Moral, C., Antonio, A., Imbert, R., Rmirez J.: A survey of stemming algorithms in information retrieval. Inf. Res.: Int Electron. J. 19(1) (2014)
Maurya, V., Pandey, P., Maurya, L.S.: Effective information retrieval system. Int. J. Emerg. Technol. Adv. Eng. 3(4), 787–792 (2013)
Singhal, A.: Modern information retrieval: a brief overview. IEEE Data Eng. Bull. 24(4), 35–43 (2011)
J.B. Lovins, 1968, “Development of a stemming algorithm, “Mechanical Translation and Computer Linguistic., vol.11, No.1/2, pp. 22-31.
N. Sandhya, Y. Sri Lalitha, V.Sowmya, Dr. K. Anuradhaand Dr. A. Govardhan, 2011, Analysis of Stemming Algorithm for Text Clustering, International Journal of Computer Science Issues, ISSN 1694 - 0814
Arifin, A. Z. and A. N. Setiono. 2002. Classification of Event News Documents in Indonesian Language using Single Pass Clustering Algorithm. Proc. of the Seminar on Intelligent Technology and its Application.
Asian, 2005, Stemming Indonesian, In Proc. Twenty-Eighth Australasian Computer Science Conference (ACSC 2005), Newcastle, Australia. CRPIT, 38. Estivill-Castro, V., Ed. ACS. 307-314.
Arifin, A.Z., I.P.A.K. Mahendra dan H.T. Ciptaningtyas. 2009 “Enhanced Confix Stripping Stemmer and Ants Algorithm for Classifying News Document in Indonesian Language”, Proceeding of International Conference on Information &Communication Technology and Systems (ICTS).
Arai, K., Barakbah, A. R.. 2007. Hierarchical K-Means: an algorithm for centroids initialization for K-Means, the Faculty of Science and Engineering, Saga University, Vol. 36, No.1
Alfina, T., Santosa, B. and Barakbah, A.R. 2010. Analisa Perbandingan Metode Hierarchical clustering, K-Means dan Gabungan Keduanya dalam Cluster Data (Studi kasus: Problem Kerja Praktek Jurusan Teknik Industri ITS). Jurnal Teknik ITS Vol. 1, (Sept, 2012) ISSN: 2301-9271. Surabaya
Liu T., S. Liu, Z. Chen and Wei-Ying Ma. “An Evaluation on Feature Selection for Text Clustering”. Proceedings of the 12th International Conference (ICML 2003), Washington, DC, USA. PP 488-495. 2003.
Adriani, M., Asian, J., Nazief, B., Tahaghoghi, S.M.M., Williams, H.E. 2007. Stemming Indonesian: A Confix-Stripping Approach. Transaction on Asian Langeage Information Processing. Vol. 6, No. 4, Artikel 13. Association for Computing Machinery: New York

Index Terms

Computer Science

Information Sciences

Keywords

Stemming Clustering Confusion Matrix Enhance Confix Stripping Stemming