Call for Paper - January 2024 Edition
IJCA solicits original research papers for the January 2024 Edition. Last date of manuscript submission is December 20, 2023. Read More

Text Document Tokenization for Word Frequency Count using Rapid miner

Print
PDF
IJCA Proceedings on International Conference on Advancements in Engineering and Technology
© 2015 by IJCA Journal
ICAET 2015 - Number 12
Year of Publication: 2015
Authors:
Gaurav Gupta
Sumit Malhotra

Gaurav Gupta and Sumit Malhotra. Article: Text Document Tokenization for Word Frequency Count using Rapid miner. IJCA Proceedings on International Conference on Advancements in Engineering and Technology ICAET 2015(12):24-26, August 2015. Full text available. BibTeX

@article{key:article,
	author = {Gaurav Gupta and Sumit Malhotra},
	title = {Article: Text Document Tokenization for Word Frequency Count using Rapid miner},
	journal = {IJCA Proceedings on International Conference on Advancements in Engineering and Technology},
	year = {2015},
	volume = {ICAET 2015},
	number = {12},
	pages = {24-26},
	month = {August},
	note = {Full text available}
}

Abstract

Text mining, at times alluded to as content information mining, is harshly equal to content investigation, which alludes to the procedure of determining astounding data from content. RapidMiner is unquestionably the world-leading open-source system for data mining. It is available as a stand-alone application for data analysis and as a data mining engine for the integration into own products. Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The word frequency counter allows you to count the frequency usage of each word in your document. Applying tokenization and word frequency counter for a text document (resume in this case) helps us find out occurrence of each word in a document but there is no provision to find a particular word frequency occurrence according to user choice.

References

  • Textminingfromhttp://en. wikipedia. org/wiki/Text_mining.
  • RapidMinerfromhttp://en. wikipedia. org/wiki/RapidMiner.
  • RapidMinerStudiofromhttp://rapidminer. com/products/ rapidminer-studio/.
  • To find frequency of the words using RapidMiner(2012). Retrieved June 22, 2012, from http:// gunjanaaggarwal. blogspot. in/2012/07/words-frequency- text-analytics. html.
  • Value and benefits of text mining from http://www. jisc. ac. uk/reports/value-and-benefits-of-text-mining.
  • Tanu Verma,Renu,Deepti Gaur,"Tokenization and FilteringProcess in RapidMiner", International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868 ,Volume 7– No. 2, April 2014.
  • Jordan Shterev,"Demo: Using RapidMiner for Text Mining",Digital Presentation and Preservation of Cultural and ScientificHeritage (Digital Presentation and Preservation of Cultural andScientific Heritage), issue: III / 2013, pages: 254256
  • TipawanSilwattananusarnand Assoc. Prof. Dr. KulthidaTuamsuk,"Data Mining and Its Applications for KnowledgeManagement::A Literature Review from 2007to 2012"International Journal of Data Mining & KnowledgeManagement Process(IJDKP) Vol. 2, No. 5, September 2012.