Call for Paper - November 2022 Edition
IJCA solicits original research papers for the November 2022 Edition. Last date of manuscript submission is October 20, 2022. Read More

Results and Inference Obtained from a Small Implementation of the DF-ICF- The Modified TF-IDF

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2017
Authors:
Vidya Kamath
10.5120/ijca2017913877

Vidya Kamath. Results and Inference Obtained from a Small Implementation of the DF-ICF- The Modified TF-IDF. International Journal of Computer Applications 166(1):20-23, May 2017. BibTeX

@article{10.5120/ijca2017913877,
	author = {Vidya Kamath},
	title = {Results and Inference Obtained from a Small Implementation of the DF-ICF- The Modified TF-IDF},
	journal = {International Journal of Computer Applications},
	issue_date = {May 2017},
	volume = {166},
	number = {1},
	month = {May},
	year = {2017},
	issn = {0975-8887},
	pages = {20-23},
	numpages = {4},
	url = {http://www.ijcaonline.org/archives/volume166/number1/27633-2017913877},
	doi = {10.5120/ijca2017913877},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

DF-ICF is an algorithm designed by modifying the well known TF-IDF, for the purpose of improving the performance and reliability. The work mainly presents the validation of this new algorithm. The algorithm has been implemented with Hadoop using Cloudera, VMware and WampServer in order to conduct experiments. It also presents the results of an experiment conducted on the algorithm. Finally, the performance of the algorithm is predicted based on assumptions by comparing it with that of the TF-IDF. Overall it was found out that DF-ICF is actually better than TF-IDF.

References

  1. Puneet Goswami, Vidya Kamath, “The DF-ICF algorithm- Modified TF-IDF”, International Journal of Computer Applications, Volume 93, No 13, May 2014.
  2. SALTON G, BUCKLEY C. Term-weighting approaches in automatic text retrieval [J]. Information Processing and Management, 1988, PP513 - 523.
  3. SALTON G, CLEMENT T Y. On the construction of effective vocabularies for information retrieval[C].Proceedings of the 1973
  4. Bin Li, Yuan Guoyong- “ Improvement of tf-idf for Hadoop Framework” The 2nd International Conference on Computer Application and System Modeling (2012)
  5. Moty Fania, John David Miller- White paper- “Mining Big Data in the Enterprise for Better Business Intelligence”, Intel July 2012
  6. Puneet Goswami, Vidya Kamath-“Big Data- Driving force for innovation and Value Recreation”, IJARCSSE Volume 4, Issue 3-March 2014.
  7. Jana Vembunarayanan, “ TF-IDF and cosine simililarity”, Seeking Wisdom, Oct 2013
  8. “Making data Analytics Work- three Key Challenges. McKinsey and Company. IDC Digital Universe Study, sponsored by EMC, June 2011
  9. Stamatis Karnouskos-“ Big data analytics for Smart Grid Cities” . Eurescom mess@ge 1- 2013.
  10. LiThomas H Davenport, Jill Dyche- “Big Data in Big Companies” International Institute for Analytics, may 2013
  11. M. Santhanakumar , C. Christopher Columbus- “A modified frequency based term weighting approach for information retrieval ” , Int. J. Chem. Sci.: 14(1), 2016, 449-457 ISSN 0972-768X.
  12. Liu Zhenyan, Meng Dan, Wang Weiping, Zhang Chunxia - “ A Supervised Parameter Estimation Method of LDA” , 17th Asia-Specific Web Conference, APWeb 2015, China, 2015 proceedings- Springer .
  13. Chengzhi Zhang, Huilin Wang, Yao Liu , and Hongjiao “Document Clustering Description Extraction and Its Application” , XuW. Li and D. Mollá-Aliod (Eds.): ICCPOL 2009, LNAI 5459, pp. 370–377, 2009, Springer.

Keywords

TF-IDF, DF-ICF, Cosine Similarity, Document, Term, Corpus