Call for Paper - September 2020 Edition
IJCA solicits original research papers for the September 2020 Edition. Last date of manuscript submission is August 20, 2020. Read More

Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2018
Authors:
Shahzad Qaiser, Ramsha Ali
10.5120/ijca2018917395

Shahzad Qaiser and Ramsha Ali. Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents. International Journal of Computer Applications 181(1):25-29, July 2018. BibTeX

@article{10.5120/ijca2018917395,
	author = {Shahzad Qaiser and Ramsha Ali},
	title = {Text Mining: Use of TF-IDF to Examine the Relevance of Words to Documents},
	journal = {International Journal of Computer Applications},
	issue_date = {July 2018},
	volume = {181},
	number = {1},
	month = {Jul},
	year = {2018},
	issn = {0975-8887},
	pages = {25-29},
	numpages = {5},
	url = {http://www.ijcaonline.org/archives/volume181/number1/29681-2018917395},
	doi = {10.5120/ijca2018917395},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

In this paper, the use of TF-IDF stands for (term frequency-inverse document frequency) is discussed in examining the relevance of key-words to documents in corpus. The study is focused on how the algorithm can be applied on number of documents. First, the working principle and steps which should be followed for implementation of TF-IDF are elaborated. Secondly, in order to verify the findings from executing the algorithm, results are presented, then strengths and weaknesses of TD-IDF algorithm are compared. This paper also talked about how such weaknesses can be tackled. Finally, the work is summarized and the future research directions are discussed.

References

  1. Bafna, P., Pramod, D., and Vaidya, A. (2016). "Document clustering: TF-IDF approach," International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), Chennai, 2016, pp. 61-66
  2. Trstenjak, B., Mikac, S., & Donko, D. (2014). “KNN with TF-IDF based framework for text categorization” In Procedia Engineering. Vol. 69, pp. 1356–1364. Elsevier Ltd
  3. Gautam, J., & Kumar, E.L. (2013). “An Integrated and Improved Approach to Terms Weighting in Text Classification,” International Journal of Computer Science Issues, Vol 10, Issue 1, No 1, January 2013
  4. Hakim, A. A., Erwin, A., Eng, K. I., Galinium, M., & Muliady, W. (2015). “Automated document classification for news article in Bahasa Indonesia based on term frequency inverse document frequency (TF-IDF) approach,” 6th International Conference on Information Technology and Electrical Engineering: Leveraging Research and Technology, (ICITEE), 2014
  5. Gurusamy, V., & Kannan, S. (2014). “Preprocessing Techniques for Text Mining,” RTRICS, pp. 7-16
  6. Nam, S., and Kim, K. (2017). "Monitoring Newly Adopted Technologies Using Keyword Based Analysis of Cited Patents," IEEE Access, vol. 5, pp. 23086-23091
  7. Ramos, J. (2003). “Using TF-IDF to Determine Word Relevance in Document Queries,” Proceedings of the First Instructional Conference on Machine Learning, pp. 1–4
  8. Santhanakumar, M., and Columbus, C.C. (2015). “Various Improved TFIDF Schemes for Term Weighing in text Categorization: A Survey," International Journal of Applied Engineering Research, vol. 10, no. 14, pp. 11905-11910
  9. Dai, W. (2018). “Improvement and Implementation of Feature Weighting Algorithm TF-IDF in Text Classification,” International Conference on Network, Communication, Computer Engineering (NCCE 2018), vol. 147
  10. Fan, H., and Qin, Y. (2018). “Research on Text Classification Based on Improved TF-IDF Algorithm,” International Conference on Network, Communication, Computer Engineering (NCCE 2018), vol. 147

Keywords

TF-IDF, Data Mining, Relevance of Words to Documents