CFP last date
20 June 2024
Reseach Article

Sentiment Analysis using Averaged Histogram

by Subarno Pal, Soumadip Ghosh
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 162 - Number 12
Year of Publication: 2017
Authors: Subarno Pal, Soumadip Ghosh
10.5120/ijca2017913421

Subarno Pal, Soumadip Ghosh . Sentiment Analysis using Averaged Histogram. International Journal of Computer Applications. 162, 12 ( Mar 2017), 22-26. DOI=10.5120/ijca2017913421

@article{ 10.5120/ijca2017913421,
author = { Subarno Pal, Soumadip Ghosh },
title = { Sentiment Analysis using Averaged Histogram },
journal = { International Journal of Computer Applications },
issue_date = { Mar 2017 },
volume = { 162 },
number = { 12 },
month = { Mar },
year = { 2017 },
issn = { 0975-8887 },
pages = { 22-26 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume162/number12/27296-2017913421/ },
doi = { 10.5120/ijca2017913421 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:08:55.232591+05:30
%A Subarno Pal
%A Soumadip Ghosh
%T Sentiment Analysis using Averaged Histogram
%J International Journal of Computer Applications
%@ 0975-8887
%V 162
%N 12
%P 22-26
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Sentiment analysis or opinion mining is a process of categorizing and identifying the sentiment expressed in a particular text. The need of automatic sentiment retrieval of the text is quite high as amount of reviews obtained from the Internet are huge in number. Reviews on various ‘E-commerce websites’, ‘social networks’, and ‘movie review websites’ come up huge in number regularly. These reviews on popular products help in determining the public opinion towards the product. An averaged histogram model is proposed in the process that deals with text classification in continuous variable approach. After data cleaning and feature extraction from the reviews, average histograms are constructed for every class, containing a generalized feature representation in that particular class. Histograms of every test elements are then matched with the averaged histograms of every class using k-Nearest Neighbor and Naïve Bayesian Classifier. Results showed on 3000 reviews a steady classification accuracy of 79-80% with the Naïve Bayesian Classifier with very little cost of computation, and increase in the number of training dataset k-Nearest Neighbor can give up to a high accuracy of 85%. This work proposed here is language independent, neither include any dictionary nor depend on the meaning of any word.

References
  1. A. Mehto and K. Indras. “Data Mining through Sentiment Analysis: Lexicon based Sentiment Analysis Model using Aspect Catalogue”, IEEE, 2016 Symposium on Colossal Data Analysis and Networking (CDAN).
  2. S. Joshi, S. Mehta, P. Mestry and A. Save. “A New Approach to Target Dependent Sentiment Analysis withOnto-Fuzzy Logic”, 2 nd IEEE International Conference on Engineering and Technology (ICETECH), 17 th & 18 th March 2016.
  3. P. Chikersal, S. Poria and E. Cambria. “SeNTU: Sentiment Analysis of Tweets by Combining a Rule-based Classifier with Supervised Learning”, Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp.647–651,
  4. A. Kumar and T. Mary Sebastian. “Sentiment Analysis on Twitter”, International Journal of Computer Science Issues, Vol. 9, Issue 4, No 3, July 2012, pp.372-378.
  5. X. Wang, F. Wei, X. Liu, M. Zhou, M. Zhang. “Topic Sentiment Analysis in Twitter: A Graph-based Hashtag Sentiment Classification Approach”, Microsoft Research Asia, Beijing, China.
  6. Kotzias et. al. “From Group to Individual labels using Deep Features”, KDD 2015.
  7. Han J. and Kamber M. “Data Mining: Concepts and Techniques”.
  8. imdb: Maas et. al., 2011 'Learning word vectors for sentiment analysis'
  9. amazon: McAuley et. al., 2013 'Hidden factors and hidden topics: Understanding rating dimensions with review text'
  10. Yelp: Yelp dataset challenge http://www.Yelp.com/dataset_challenge
  11. Kouloumpis E. et. al. “Twitter Sentiment Analysis: The Good the Bad and the OMG!”, Fifth International AAAI Conference on Weblogs and Social Media.
  12. Warintarawej, P., Laurent, A., Pompidor, P. and Laurent, B. (2010) ‘Classification of brand names based on n-grams’, International Conference of Soft Computing and Pattern Recognition, 2010.
  13. H.-X. Shi and X.-J. Li, ‘A sentiment analysis model for hotel reviews based on supervised learning’, 2011 International Conference on Machine Learning and Cybernetics, Jan. 2011.
  14. Maqbool Al-Maimani ,Naomie Salim, Ahmed M. Al-Naamany, "Semantic And Fuzzy Aspects Of Opinion Mining",Journal Of Theoretical And Applied Information Technology Vol. 63 No.2, 20th May 2014, pp.330-342.
  15. H. Kang, S. J. Yoo, D. Han, “Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews”, Expert Systems with Applications, vol. 39, no. 5, pp. 6000–6010, 2012.
  16. M. Çetin and M. F. Amasyali, “Active learning for Turkish sentiment analysis,” In IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA), pp. 1–4, 2013.
  17. R. Moraes, J. F. Valiati, and W. P. Gavião Neto, “Document-level sentiment classification: An empirical comparison between SVM and ANN”, Expert Systems with Applications, vol. 40, no. 2, pp. 621–633,2013.
  18. H. Nizam and S. S. AkÕn, “Sosyal Medyada Makine Ö÷renmesiile Duygu Analizinde Dengeli ve Dengesiz Veri Setlerinin PerformanslarÕnÕnKarúÕlaútÕrÕlmasÕ”, In 19. Türkiye’de ønternet KonferansÕ, øzmir, 2014.
  19. M. Meral and B. Diri, “Sentiment analysis on Twitter”, Signal Processing and Communications Applications Conference (SIU), pp. 690–693, 2014.
  20. X. Zhu, X. Wu, and Y. Yang. Effective classification of noisy data streams with attribute-oriented dynamic classifier selection. Knowledge and Information Systems, 9(3):339–363, 2006.
  21. Y. He, D. Zhou, “Self-training from labeled features for sentiment analysis”, Information Processing & Management, vol. 47, no. 4, pp. 606–616, 2011.
Index Terms

Computer Science
Information Sciences

Keywords

Averaged Histogram