CFP last date
20 May 2024
Call for Paper
June Edition
IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper
Know more
Reseach Article

Text Document Tokenization for Word Frequency Count using Rapid miner

Published on August 2015 by Gaurav Gupta, Sumit Malhotra
International Conference on Advancements in Engineering and Technology
Foundation of Computer Science USA
ICAET2015 - Number 12
August 2015
Authors: Gaurav Gupta, Sumit Malhotra
d043d04f-ec5f-4446-afdd-47ecda92c1dc

Gaurav Gupta, Sumit Malhotra . Text Document Tokenization for Word Frequency Count using Rapid miner. International Conference on Advancements in Engineering and Technology. ICAET2015, 12 (August 2015), 24-26.

@article{
author = { Gaurav Gupta, Sumit Malhotra },
title = { Text Document Tokenization for Word Frequency Count using Rapid miner },
journal = { International Conference on Advancements in Engineering and Technology },
issue_date = { August 2015 },
volume = { ICAET2015 },
number = { 12 },
month = { August },
year = { 2015 },
issn = 0975-8887,
pages = { 24-26 },
numpages = 3,
url = { /proceedings/icaet2015/number12/22291-4172/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 International Conference on Advancements in Engineering and Technology
%A Gaurav Gupta
%A Sumit Malhotra
%T Text Document Tokenization for Word Frequency Count using Rapid miner
%J International Conference on Advancements in Engineering and Technology
%@ 0975-8887
%V ICAET2015
%N 12
%P 24-26
%D 2015
%I International Journal of Computer Applications
Abstract

Text mining, at times alluded to as content information mining, is harshly equal to content investigation, which alludes to the procedure of determining astounding data from content. RapidMiner is unquestionably the world-leading open-source system for data mining. It is available as a stand-alone application for data analysis and as a data mining engine for the integration into own products. Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The word frequency counter allows you to count the frequency usage of each word in your document. Applying tokenization and word frequency counter for a text document (resume in this case) helps us find out occurrence of each word in a document but there is no provision to find a particular word frequency occurrence according to user choice.

References
  1. Textminingfromhttp://en. wikipedia. org/wiki/Text_mining.
  2. RapidMinerfromhttp://en. wikipedia. org/wiki/RapidMiner.
  3. RapidMinerStudiofromhttp://rapidminer. com/products/ rapidminer-studio/.
  4. To find frequency of the words using RapidMiner(2012). Retrieved June 22, 2012, from http:// gunjanaaggarwal. blogspot. in/2012/07/words-frequency- text-analytics. html.
  5. Value and benefits of text mining from http://www. jisc. ac. uk/reports/value-and-benefits-of-text-mining.
  6. Tanu Verma,Renu,Deepti Gaur,"Tokenization and FilteringProcess in RapidMiner", International Journal of Applied Information Systems (IJAIS) – ISSN : 2249-0868 ,Volume 7– No. 2, April 2014.
  7. Jordan Shterev,"Demo: Using RapidMiner for Text Mining",Digital Presentation and Preservation of Cultural and ScientificHeritage (Digital Presentation and Preservation of Cultural andScientific Heritage), issue: III / 2013, pages: 254256
  8. TipawanSilwattananusarnand Assoc. Prof. Dr. KulthidaTuamsuk,"Data Mining and Its Applications for KnowledgeManagement::A Literature Review from 2007to 2012"International Journal of Data Mining & KnowledgeManagement Process(IJDKP) Vol. 2, No. 5, September 2012.
Index Terms

Computer Science
Information Sciences

Keywords

Rapidminer rapidminer Text Processing Rapidminer Process Document From File Operator Rapidminer Transform Case Operator Rapidminer Tokenize Operator.