CFP last date
20 May 2024
Reseach Article

Exploring N-gram, Word Embedding and Topic Models for Content-based Fake News Detection in FakeNewsNet Evaluation

by Oluwafemi Oriola
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 176 - Number 39
Year of Publication: 2020
Authors: Oluwafemi Oriola
10.5120/ijca2020920503

Oluwafemi Oriola . Exploring N-gram, Word Embedding and Topic Models for Content-based Fake News Detection in FakeNewsNet Evaluation. International Journal of Computer Applications. 176, 39 ( Jul 2020), 25-30. DOI=10.5120/ijca2020920503

@article{ 10.5120/ijca2020920503,
author = { Oluwafemi Oriola },
title = { Exploring N-gram, Word Embedding and Topic Models for Content-based Fake News Detection in FakeNewsNet Evaluation },
journal = { International Journal of Computer Applications },
issue_date = { Jul 2020 },
volume = { 176 },
number = { 39 },
month = { Jul },
year = { 2020 },
issn = { 0975-8887 },
pages = { 25-30 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume176/number39/31460-2020920503/ },
doi = { 10.5120/ijca2020920503 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:40:52.889740+05:30
%A Oluwafemi Oriola
%T Exploring N-gram, Word Embedding and Topic Models for Content-based Fake News Detection in FakeNewsNet Evaluation
%J International Journal of Computer Applications
%@ 0975-8887
%V 176
%N 39
%P 25-30
%D 2020
%I Foundation of Computer Science (FCS), NY, USA
Abstract

FakeNewsNet is a repository of two novel datasets, PolitiFact and GossipCop, which are employed for evaluation of fake news detection techniques. Unlike other extensively studied benchmark fake news datasets, the FakeNewsNet datasets incorporate news content, social context, and dynamic information, which could be used to study fake news propagation, detection, and mitigation. Existing works on FakeNewsNet have focused on one-hot encoding, social contexts such as user-based models, and dynamic information such as news propagation model. However, n-gram, word embeddings, and topic models of news contents, which have been impressive in other contexts have not been explored. This paper therefore explores n-gram, word embeddings, and topic models of news contents for the evaluation of FakeNewsNet datasets. Unigram-based n-gram model, skip-gram word2vec-based word embeddings model and Latent Dirichlet Allocation-based topic model are extracted after preprocessing the datasets. The features are weighted by TFIDF to overcome the shortcomings of the individual models and analyzed using Logistic Regression. The evaluation of the models and their hybrids shows that n-gram model outperforms word embedding and topic models. Specifically, n-gram model records accuracy, precision, recall and F1-score of 0.80, 0.79, 0.78 and 0.79, respectively for PolitiFact and records 0.82, 0.75, 0.79 and 0.77, respectively for GossipCop. The comparison with benchmarks also shows that the performance of n-gram model is better.

References
  1. S. Vosoughi, M. N. E. O. Mohsenvand, and D. E. B. Roy, “Rumor Gauge : Predicting the Veracity of Rumors on Twitter r r,” ACM Trans. Knowl. Discov. Data, vol. 11, no. 4, 2017.
  2. R. Yan, Y. I. Li, W. Wu, D. Li, and Y. Wang, “Rumor Blocking through Online Link Deletion,” ACM Trans. Knowl. Discov. Data, vol. 13, no. 2, 2019.
  3. W. Y. Wang, “‘ Liar , Liar Pants on Fire ’: A New Benchmark Dataset for Fake News Detection,” 2016.
  4. B. Andreas Hanselowski, Avinesh PVS and and F. C. Schiller, “Team Athene on the Fake News Challenge.,” 2017. [Online]. Available: https://medium.com/@andre134679/%0Ateam-athene-on-the-fake-news-/%0Achallenge-28a5cf5e017b.
  5. BuzzFeedNews, “BuzzFeedNews,” 2016. [Online]. Available: https://github.com/BuzzFeedNews/2016-10-facebook-factcheck/%0Ablob/master/data.
  6. M. Aldwairi and A. Alwahedi, “Detecting Fake News in Social Media Networks,” Procedia Comput. Sci., vol. 141, pp. 215–222, 2018.
  7. S. Castelo, E. Nakamura, and J. Freire, “A Topic-Agnostic Approach for Identifying Fake News Pages,” in Companion Proceedings of the 2019 World Wide Web Conference (WWW ’19 Companion), p. 6pages.
  8. A. Thota, “Fake News Detection : A Deep Learning Approach,” SMU Data Sci. Rev., vol. 1, no. 3, 2018.
  9. K. Shu, D. Mahudeswaran, S. Wang, D. Lee, and H. Liu, “FakeNewsNet : A Data Repository with News Content , Social Context and Dynamic Information for Studying Fake News on Social Media,” Assoc. Adv. Artif. Intell., 2017.
  10. TampaBayTimes, “PolitiFact,” Tampa Bay Times. [Online]. Available: https://www.politifact.com/.
  11. “GossipCop.” [Online]. Available: https://www.gossipcop.com/.
  12. US, “PolitiFact.” [Online]. Available: http://politifact.com.
  13. S. Yang, K. Shu, S. Wang, R. Gu, F. Wu, and H. Liu, “Unsupervised Fake News Detection on Social Media : A Generative Approach,” 2019.
  14. B. Ghanem, P. Rosso, and F. Rangel, “Stance Detection in Fake News : A Combined Feature Representation,” in Proceedings of the First Workshop on Fact Extraction and VERification (FEVER), 2018, pp. 66–71.
  15. K. Xu, F. Wang, H. Wang, and B. Yang, “Detecting Fake News Over Online Social Media via Domain Reputations and Content Understanding,” TSINGHUA Sci. Technol., vol. 25, no. 1, pp. 20–27, 2020.
  16. P. Fortuna and S. Nunes, “A Survey on Automatic Detection of Hate Speech in Text,” ACM Comput. Surv. 51, 4, vol. 51, no. 4, 2018.
  17. G. O. and D. E. Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.
  18. S. Malmasi and M. Zampieri, “Challenges in Discriminating Profanity from Hate Speech,” pp. 1–16, 2011.
  19. A. Gaydhani, V. Doma, S. Kendre, and L. Bhagwat, “Detecting Hate Speech and Offensive Language on Twitter using Machine Learning : An N-gram and TFIDF based Approach.”
  20. T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Distributed Representations of Words and Phrases and their Compositionality,” pp. 1–9.R. Reh, “gensim Documentation,” 2017. and M. I. J. D. M. Blei, A. Y. Ng, “Latent Dirichlet allocation,” J. Mach. Learn. Res., vol. 3, no, no. 3, pp. 993– 1022, 2003.
  21. PythonTM, “Python 3.6.4.” 2017.
  22. E. L. Steven Bird, Ewan Kliein, Analyzing Texts with Natural Language Toolkit: Natural Language Processing with Python, First. O’Reilly, 2009.
  23. N. V Chawla, K. W. Bowyer, and L. O. Hall, “SMOTE : Synthetic Minority Over-sampling Technique,” vol. 16, pp. 321–357, 2002.
Index Terms

Computer Science
Information Sciences

Keywords

Fake News Detection FakeNewsNet Classification News Content Features TFIDF