Call for Paper - January 2023 Edition
IJCA solicits original research papers for the January 2023 Edition. Last date of manuscript submission is December 20, 2022. Read More

Determining Term on Text Document Clustering using Algorithm of Enhanced Confix Stripping Stemming

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2017
Authors:
Titin Winarti, Jati Kerami, Sunny Arief
10.5120/ijca2017912761

Titin Winarti, Jati Kerami and Sunny Arief. Determining Term on Text Document Clustering using Algorithm of Enhanced Confix Stripping Stemming. International Journal of Computer Applications 157(9):8-13, January 2017. BibTeX

@article{10.5120/ijca2017912761,
	author = {Titin Winarti and Jati Kerami and Sunny Arief},
	title = {Determining Term on Text Document Clustering using Algorithm of Enhanced Confix Stripping Stemming},
	journal = {International Journal of Computer Applications},
	issue_date = {January 2017},
	volume = {157},
	number = {9},
	month = {Jan},
	year = {2017},
	issn = {0975-8887},
	pages = {8-13},
	numpages = {6},
	url = {http://www.ijcaonline.org/archives/volume157/number9/26858-2017912761},
	doi = {10.5120/ijca2017912761},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

In a term based clustering technique with the vector space model, the issue of high dimensional vector space due to the number of words used always appears. This causes the clustering performance drops because the distance among the points tends to have the same value. The reduction of dimension by decreasing the number of words can be done by stemming. Stemming was used as term selection to reduce the many terms generated on preprocessing. The utilization of algorithm of enhance confix stripping stemmer reduced the terms that must be processed of 199.358 terms resulted from 108 text documents, became 5.476 terms result of the stemming. This reduction would speed up the process and saved the storage media. The evaluation by utilizing clustering was done using confusion matrix. The accuracy of experiment increased.

References

  1. Sharma, D.: Improved stemming approach used for text processing in information retrieval system. Master of Engineering in Computer Science & Engineering, Thapar University, Patiala (2012)
  2. Moral, C., Antonio, A., Imbert, R., Rmirez J.: A survey of stemming algorithms in information retrieval. Inf. Res.: Int Electron. J. 19(1) (2014)
  3. Maurya, V., Pandey, P., Maurya, L.S.: Effective information retrieval system. Int. J. Emerg. Technol. Adv. Eng. 3(4), 787–792 (2013)
  4. Singhal, A.: Modern information retrieval: a brief overview. IEEE Data Eng. Bull. 24(4), 35–43 (2011)
  5. J.B. Lovins, 1968, “Development of a stemming algorithm, “Mechanical Translation and Computer Linguistic., vol.11, No.1/2, pp. 22-31.
  6. N. Sandhya, Y. Sri Lalitha, V.Sowmya, Dr. K. Anuradhaand Dr. A. Govardhan, 2011, Analysis of Stemming Algorithm for Text Clustering, International Journal of Computer Science Issues, ISSN 1694 - 0814
  7. Arifin, A. Z. and A. N. Setiono. 2002. Classification of Event News Documents in Indonesian Language using Single Pass Clustering Algorithm. Proc. of the Seminar on Intelligent Technology and its Application.
  8. Asian, 2005, Stemming Indonesian, In Proc. Twenty-Eighth Australasian Computer Science Conference (ACSC 2005), Newcastle, Australia. CRPIT, 38. Estivill-Castro, V., Ed. ACS. 307-314.
  9. Arifin, A.Z., I.P.A.K. Mahendra dan H.T. Ciptaningtyas. 2009 “Enhanced Confix Stripping Stemmer and Ants Algorithm for Classifying News Document in Indonesian Language”, Proceeding of International Conference on Information &Communication Technology and Systems (ICTS).
  10. Arai, K., Barakbah, A. R.. 2007. Hierarchical K-Means: an algorithm for centroids initialization for K-Means, the Faculty of Science and Engineering, Saga University, Vol. 36, No.1
  11. Alfina, T., Santosa, B. and Barakbah, A.R. 2010. Analisa Perbandingan Metode Hierarchical clustering, K-Means dan Gabungan Keduanya dalam Cluster Data (Studi kasus: Problem Kerja Praktek Jurusan Teknik Industri ITS). Jurnal Teknik ITS Vol. 1, (Sept, 2012) ISSN: 2301-9271. Surabaya
  12. Liu T., S. Liu, Z. Chen and Wei-Ying Ma. “An Evaluation on Feature Selection for Text Clustering”. Proceedings of the 12th International Conference (ICML 2003), Washington, DC, USA. PP 488-495. 2003.
  13. Adriani, M., Asian, J., Nazief, B., Tahaghoghi, S.M.M., Williams, H.E. 2007. Stemming Indonesian: A Confix-Stripping Approach. Transaction on Asian Langeage Information Processing. Vol. 6, No. 4, Artikel 13. Association for Computing Machinery: New York

Keywords

Stemming, Clustering, Confusion Matrix, Enhance Confix Stripping Stemming