CFP last date
22 April 2024
Reseach Article

Classification of Tweets based on Emotions using Word Embedding and Random Forest Classifiers

by Parth Vora, Mansi Khara, Kavita Kelkar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 178 - Number 3
Year of Publication: 2017
Authors: Parth Vora, Mansi Khara, Kavita Kelkar
10.5120/ijca2017915773

Parth Vora, Mansi Khara, Kavita Kelkar . Classification of Tweets based on Emotions using Word Embedding and Random Forest Classifiers. International Journal of Computer Applications. 178, 3 ( Nov 2017), 1-7. DOI=10.5120/ijca2017915773

@article{ 10.5120/ijca2017915773,
author = { Parth Vora, Mansi Khara, Kavita Kelkar },
title = { Classification of Tweets based on Emotions using Word Embedding and Random Forest Classifiers },
journal = { International Journal of Computer Applications },
issue_date = { Nov 2017 },
volume = { 178 },
number = { 3 },
month = { Nov },
year = { 2017 },
issn = { 0975-8887 },
pages = { 1-7 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume178/number3/28651-2017915773/ },
doi = { 10.5120/ijca2017915773 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:49:21.606492+05:30
%A Parth Vora
%A Mansi Khara
%A Kavita Kelkar
%T Classification of Tweets based on Emotions using Word Embedding and Random Forest Classifiers
%J International Journal of Computer Applications
%@ 0975-8887
%V 178
%N 3
%P 1-7
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

With the large-scale penetration of social media into our daily lives, it has become a platform for individuals to share and express their views, feelings, opinions, and thoughts. Identifying emotions has many applications ranging from personalized marketing to behavior study. Individuals express their feelings in a language that is frequently accompanied by ambiguity and figure of speech, which makes it difficult even for humans to comprehend. In this paper, we propose a new approach to classify text into emotion categories. We use Twitter data as labeled input, this data is labeled using hashtags and addresses features like emoticons, emoji, apostrophes, Twitter slang and spelling variations which are a part of informal language on social media. Our model uses word vectors generated by architecture like Word2vec, Glove, and Fasttext to generate word representations of the text. We then investigate the utility of these models on random forest classifier. Ultimately we compare the results to find the best model for text classification based on emotions. We achieve an overall 91% precision for four emotional classes on a mined dataset of more than 100,000 tweets. This is a very useful tool to understand human behavior and a natural step beyond the positive/negative polarity.

References
  1. Bollen, Johan, Huina Mao, and Xiaojun Zeng. "Twitter mood predicts the stock market." Journal of computational science 2.1 (2011): 1-8.
  2. Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).
  3. Bengio, Yoshua, et al. "A neural probabilistic language model." Journal of machine learning research 3.Feb (2003): 1137-1155.
  4. Schwenk, Holger. "Continuous space language models." Computer Speech & Language 21.3 (2007): 492-518.
  5. Mikolov, Tomáš, et al. "Empirical evaluation and combination of advanced language modeling techniques." Twelfth Annual Conference of the International Speech Communication Association. 2011.
  6. Pennington, Jeffrey, Richard Socher, and Christopher Manning. "Glove: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.
  7. Bojanowski, Piotr, et al. "Enriching word vectors with subword information." arXiv preprint arXiv:1607.04606 (2016).
  8. Joulin, Armand, et al. "Bag of tricks for efficient text classification." arXiv preprint arXiv:1607.01759 (2016).
  9. Hasan, Maryam, Emmanuel Agu, and Elke Rundensteiner. "Using hashtags as labels for supervised learning of emotions in Twitter messages." Proceedings of the Health Informatics Workshop (HI-KDD). 2014
  10. Hasan, Maryam, Elke Rundensteiner, and Emmanuel Agu. "Emotex: Detecting emotions in twitter messages." (2014).
  11. Barbieri, Francesco, Francesco Ronzano, and Horacio Saggion. "What does this Emoji Mean? A Vector Space Skip-Gram Model for Twitter Emojis." LREC. 2016.
  12. Norvig, Peter. "How to write a spelling corrector." De: http://norvig. com/spell-correct. HTML (2007).
  13. Rehurek, Radim, and Petr Sojka. "Software framework for topic modelling with large corpora." In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. 2010.
  14. Bird, Steven, Ewan Klein, and Edward Loper. Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.", 2009.
  15. Xu, Baoxun, et al. "An Improved Random Forest Classifier for Text Categorization." JCP 7.12 (2012): 2913-2920.
  16. Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." Journal of Machine Learning Research 12.Oct (2011): 2825-2830.
Index Terms

Computer Science
Information Sciences

Keywords

Keywords Word vectors random forests Word2vec Glove emotion text classification