Call for Paper - November 2023 Edition
IJCA solicits original research papers for the November 2023 Edition. Last date of manuscript submission is October 20, 2023. Read More

Analysis of Opinion Mining on Social Media Data Streams using Hadoop

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2016
Padala S. Venkata Durga Gayatri, Archana Raghuvamshi

Padala Venkata Durga S Gayatri and Archana Raghuvamshi. Analysis of Opinion Mining on Social Media Data Streams using Hadoop. International Journal of Computer Applications 155(6):45-49, December 2016. BibTeX

	author = {Padala S. Venkata Durga Gayatri and Archana Raghuvamshi},
	title = {Analysis of Opinion Mining on Social Media Data Streams using Hadoop},
	journal = {International Journal of Computer Applications},
	issue_date = {December 2016},
	volume = {155},
	number = {6},
	month = {Dec},
	year = {2016},
	issn = {0975-8887},
	pages = {45-49},
	numpages = {5},
	url = {},
	doi = {10.5120/ijca2016912336},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}


Twitter is a social networking site in which the data to be processed is in rich amounts and which can be structured, semi-structured and unstructured data streams. Opinion mining over the Twitter offers organizations a fast and effective way to monitor the feelings of public towards their services. It focuses on predicting the polarity of words and then classifies them into positive and negative feelings with the aim of identifying attitude and opinions that are expressed in any form or language. Bian et al.’s method (2012) annotated the twitter corpus which was focused on Adverse Drug Reaction (ADR) which includes the broad pharmacological coverage. Bingwei et al.’s method ( 2013) evaluates the scalability of Naive Bayes classifier (NBC) in large datasets instead of using the standard library. Skuza et al.’s method (2015) estimated the future stock prices by calculating in distributed environment according to Map Reduce programming model. Mohit et al.’s method, (2014) explains how the Map – Reduce paradigm can be applied to existing Naïve Bayes algorithm to handle a large number of tweets. All these approaches say about the real-world data sets at its accuracy level by using Hadoop File System. This paper analyses all the above methods comparatively.


  1. T. Wilson, J. Wiebe, and P. Hoffmann, “Recognizing contextual polarity in phrase-level sentiment analysis,” in Proceedings of HLT and EMNLP. ACL, (2005), pp. 347–354
  2. C. C. Tao, S. K. Kim, Y. A. Lin, Y. Y. Yu, G. Bradski, A. Y. Ng and Kunle Olukotun, “Map-reduce for machine learning on multicore”, In NIPS, vol. 6, (2006), pp. 281-288.
  3. B. Jiang, U. Topaloglu and F. Yu, “Towards large-scale twitter mining for drug-related adverse events”, In Proceedings of the 2012 international workshop on Smart health and wellbeing, ACM, (2012), pp. 25-32.
  4. Jiang, K., & Zheng, Y. (2013). Mining Twitter Data for Potential Drug Effects. In Advanced Data Mining and Applications (pp. 434–443). Springer.
  5. M. Gamon, A. Aue, S. Corston-Oliver, and E. Ringger, “Pulse: Mining customer opinions from free text,” in Advances in Intelligent Data Analysis VI. Springer, 2005, pp. 121–132.
  6. U. Kang, D. H. Chau, and C. Faloutsos, “Mining large graphs: Algorithms, inference, and discoveries,” in Data Engineering (ICDE), 2011 IEEE 27th International Conference on, 2011, pp. 243–254.
  7. D. Pessemier and Martens “MovieTweetings: A Movie Reviews Dataset Collected From Twitter”, Ghent University, Ghent, Belgium, (2013).
  8. M. Thomas, B. Pang, and L. Lee, “Get out the vote: Determining support or opposition from congressional floor-debate transcripts,” in Proceedings of the 2006 conference on empirical methods in natural language processing. Association for Computational Linguistics, 2006, pp. 327–335.
  9. L. Bingwei, E. Blasch, Y. Chen, D. Shen and G. Chen, “Scalable Sentiment Classification for Big Data Analysis Using Naive Bayes Classifier”, In Big Data, 2013 IEEE International Conference on, IEEE, (2013), pp. 99-104.
  10. Twitter. Twitter Search API, available at
  11. S. Michal and A. Romanowski, “Sentiment analysis of Twitter data within big data distributed environment for stock prediction”, In Computer Science and Information Systems (FedCSIS), 2015 Federated Conference on, IEEE, (2015), pp. 1349-1354
  12. T. White, “Hadoop: The Definitive Guide”, Third Edition, O'Reilley
  13. Malkani, Zahan, and Evelyn Gillie. "Supervised Multi-Class Classification of Tweets." (2012).
  14. T. Mohit, I. Gohokar, J. Sable, D. Paratwar and R. Wajgi, “Multi-Class Tweet Categorization Using Map Reduce Paradigm”, In International Journal of Computer Trends and Technology. (2014), pp. 78-81.


Twitter, social networking sites, Navie Bayes Classifier (NBC), Map-Reduce, Hadoop File System (HDFS).