CFP last date
22 April 2024
Reseach Article

Performance Investigation of Feature Selection Methods and Sentiment Lexicons for Sentiment Analysis

Published on July 2012 by Anuj Sharma, Shubhamoy Dey
Advanced Computing and Communication Technologies for HPC Applications
Foundation of Computer Science USA
ACCTHPCA - Number 3
July 2012
Authors: Anuj Sharma, Shubhamoy Dey
54c32ce2-710d-4c62-bb4c-9ff0ae558709

Anuj Sharma, Shubhamoy Dey . Performance Investigation of Feature Selection Methods and Sentiment Lexicons for Sentiment Analysis. Advanced Computing and Communication Technologies for HPC Applications. ACCTHPCA, 3 (July 2012), 15-20.

@article{
author = { Anuj Sharma, Shubhamoy Dey },
title = { Performance Investigation of Feature Selection Methods and Sentiment Lexicons for Sentiment Analysis },
journal = { Advanced Computing and Communication Technologies for HPC Applications },
issue_date = { July 2012 },
volume = { ACCTHPCA },
number = { 3 },
month = { July },
year = { 2012 },
issn = 0975-8887,
pages = { 15-20 },
numpages = 6,
url = { /specialissues/accthpca/number3/7566-1020/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Special Issue Article
%1 Advanced Computing and Communication Technologies for HPC Applications
%A Anuj Sharma
%A Shubhamoy Dey
%T Performance Investigation of Feature Selection Methods and Sentiment Lexicons for Sentiment Analysis
%J Advanced Computing and Communication Technologies for HPC Applications
%@ 0975-8887
%V ACCTHPCA
%N 3
%P 15-20
%D 2012
%I International Journal of Computer Applications
Abstract

Sentiment analysis or opinion mining has become an open research domain after proliferation of Internet and Web 2. 0 social media. People express their attitudes and opinions on social media including blogs, discussion forums, tweets, etc. and, sentiment analysis concerns about detecting and extracting sentiment or opinion from online text. Sentiment based text classification is different from topical text classification since it involves discrimination based on expressed opinion on a topic. Feature selection is significant for sentiment analysis as the opinionated text may have high dimensions, which can adversely affect the performance of sentiment analysis classifier. This paper explores applicability of feature selection methods for sentiment analysis and investigates their performance for classification in term of recall, precision and accuracy. Five feature selection methods (Document Frequency, Information Gain, Gain Ratio, Chi Squared, and Relief-F) and three popular sentiment feature lexicons (HM, GI and Opinion Lexicon) are investigated on movie reviews corpus with a size of 2000 documents. The experimental results show that Information Gain gave consistent results and Gain Ratio performs overall best for sentimental feature selection while sentiment lexicons gave poor performance. Furthermore, we found that performance of the classifier depends on appropriate number of representative feature selected from text.

References
  1. Liu, B. (2010). Sentiment analysis and subjectivity. Handbook of Natural Language Processing, 627-666.
  2. Tang, H. , Tan, S. , & Cheng, X (2009). A survey on sentiment detection of reviews. Expert Systems with Applications, 36, 7 2009, 10760-10773.
  3. Pang, B. , & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2), 1-135.
  4. Tsytsarau, M. , & Palpanas, T (2011). Survey on mining subjective data on the web. Data Mining and Knowledge Discovery, 1-37.
  5. Pang, B. , Lee, L. , & Vaithyanathan, S. (2002). Thumbs up? sentiment classification using machine learning techniques. In Proceedings of the Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10. Association for Computational Linguistics,
  6. Fahrni, A. , & Klenner, M. (2008). Old wine or warm beer: Target-specific sentiment analysis of adjectives. In Symposion on Affective Language in Human and Machine, AISB Convention, pages 60 – 63.
  7. Wang, S. , Li, D. , Song, X. , Wei, Y. , & Li, H. (2011). A feature selection method based on improved fisher's discriminant ratio for text sentiment classification. Expert Systems with Applications, 38, 7 2011, 8696-8702.
  8. Zhai, Z. , Xu, H. , Kang, B. , & Jia, P. (2011). Exploiting effective features for chinese sentiment classification. Expert Systems with Applications, 38, 8 2011, 9139-9146.
  9. Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. ECML, 137–142
  10. Hatzivassiloglou, V. , & McKeown, K. R. (1997). Predicting the semantic orientation of adjectives. In Proceedings of the 35th ACL/8th EACL, pp. 174–181.
  11. Turney, P. D. (2002). Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In Proceedings of the Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania.
  12. Dave, K. , Lawrence, S. , & Pennock, D. M. (2003). Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In Proceedings of the 12th international WWW conference, May 20–24, 2003 (pp. 519–528). Budapest, Hungary.
  13. Pang, B. , & Lee, L. (2004). A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd annual meeting of the Association for Computational Linguistics (ACL), July 21–26, 2004 (pp. 271–278). Barcelona, Spain.
  14. Pang, B. , & Lee, L. (2005). Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd annual meeting of the Association for Computational Linguistics (ACL), June 25–30, 2005 (pp. 115–124). University of Michigan, USA.
  15. Thomas M. , Pang B. , & Lee L. (2006). Get out the vote: determining support or opposition from congressional floor-debate transcripts. In: EMNLP, pp 327–335.
  16. Goldberg, A. B. , & Zhu, X. (2006). Seeing stars when there aren't many stars: graph-based semi-supervised learning for sentiment categorization. In Proceedings of TextGraphs workshop on graph based methods for natural language processing Association for Computational Linguistics.
  17. Chen, C. , Ibekwe-SanJuan, F. , SanJuan E. , & Weaver, C. (2006). Visual analysis of conflicting opinions. In: IEEE Symposium on Visual Analytics Science and Technology, pp 59–66.
  18. Boiy, E. , Hens, P. , Deschacht, K. , & Moens, M. F. (2007). Automatic sentiment analysis of on-line text. In Proceedings of the 11th International Conference on Electronic Publishing. Vienna, Austria.
  19. Annett, M. , & Kondrak, G. (2008). A comparison of sentiment analysis techniques: Polarizing movie blogs. Advances in Artificial Intelligence, 5032:25–35.
  20. Tan, S. , & Zhang, J. (2008). An empirical study of sentiment analysis for chinese documents. Expert Systems with Applications, 34, 4 2008, 2622-2629.
  21. Dasgupta, S. , & Ng, V. (2009). Topic-wise, sentiment-wise, or otherwise? Identifying the hidden dimension for unsupervised text classification. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Association for Computational Linguistics, Morristown, NJ, USA, EMNLP'09, pp 580–589.
  22. Ye, Q. , Zhang, Z. , & Law, R. (2009). Sentiment classification of online reviews to travel destinations by supervised machine learning approaches. Expert Systems with Applications, 36(3), 6527–6535.
  23. Paltoglou, G. , & Thelwall, M. (2010). A study of information retrieval weighting schemes for sentiment analysis. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 1386–1395, Uppsala.
  24. Bai, X. (2011). Predicting consumer sentiments from online text. Decision Support Systems, 50, 4 2011, 732-742.
  25. Kang, H. , Yoo, S. J. , & Han, D. (2011). Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Systems with Applications (2011), doi:10. 1016/j. eswa. 2011. 11. 107
  26. Hu, M. , & Liu, B. (2004). Mining and summarizing customer reviews. In The 2004 SIGKDD (pp. 168–177).
  27. Turney, P. D. , & Littman, M. L. (2003). Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems, 21(4), 315–346.
  28. Esuli, A. , & Sebastiani, F. (2005). Determining the semantic orientation of terms through gloss classification. In Proceedings of CIKM-05, the ACM SIGIR conference on information and knowledge management, Bremen, DE.
  29. Yang, Y. , & Pedersen, Jan O. (1997). A comparative study on feature selection in text categorization. ICML, 412–420.
  30. Quinlan, J. R. (1993). C4. 5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.
  31. Kononenko, I. (1994). Estimating attributes: analysis and extensions of RELIEF. In Proceedings of the European conference on machine learning. Springer-Verlag New York, Inc. , Catania, Italy, 1994.
Index Terms

Computer Science
Information Sciences

Keywords

Sentiment Analysis Feature Selection Sentiment Lexicon Classification