CFP last date
22 December 2025
Call for Paper
January Edition
IJCA solicits high quality original research papers for the upcoming January edition of the journal. The last date of research paper submission is 22 December 2025

Submit your paper
Know more
Random Articles
Reseach Article

Cross-Platform NLP Framework for Detecting LGBTQIA Hate Speech: Evaluation on Reddit and Simulated Twitter Datasets

by Alan Janbey
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Number 60
Year of Publication: 2025
Authors: Alan Janbey
10.5120/ijca2025925566

Alan Janbey . Cross-Platform NLP Framework for Detecting LGBTQIA Hate Speech: Evaluation on Reddit and Simulated Twitter Datasets. International Journal of Computer Applications. 187, 60 ( Nov 2025), 1-12. DOI=10.5120/ijca2025925566

@article{ 10.5120/ijca2025925566,
author = { Alan Janbey },
title = { Cross-Platform NLP Framework for Detecting LGBTQIA Hate Speech: Evaluation on Reddit and Simulated Twitter Datasets },
journal = { International Journal of Computer Applications },
issue_date = { Nov 2025 },
volume = { 187 },
number = { 60 },
month = { Nov },
year = { 2025 },
issn = { 0975-8887 },
pages = { 1-12 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume187/number60/cross-platform-nlp-framework-for-detecting-lgbtqia-hate-speech-evaluation-on-reddit-and-simulated-twitter-datasets/ },
doi = { 10.5120/ijca2025925566 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2025-11-29T00:49:35.660575+05:30
%A Alan Janbey
%T Cross-Platform NLP Framework for Detecting LGBTQIA Hate Speech: Evaluation on Reddit and Simulated Twitter Datasets
%J International Journal of Computer Applications
%@ 0975-8887
%V 187
%N 60
%P 1-12
%D 2025
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Online hate speech targeting the LGBTQIA community presents a persistent challenge to social cohesion and individual well-being. This study proposes a computational approach to detecting and mitigating such content using Natural Language Processing (NLP) techniques. Data were collected from public Reddit forums, annotated into offensive and acceptable categories, and pre-processed using tokenisation, normalisation, and stopword removal. Both Count Vectorisation and TF-IDF Vectorisation were employed to generate features for training a Decision Tree Classifier. To enhance robustness and assess cross-platform applicability, a simulated evaluation was also conducted on a representative Twitter dataset. The Reddit dataset evaluation yielded an accuracy of 0.76, with strong precision for acceptable content but lower precision for offensive content due to vocabulary variability. The simulated Twitter dataset showed improved balance between precision and recall, achieving an accuracy of 0.81. High-resolution visualisations, including word clouds, class distribution charts, and an NLP workflow diagram, provide insights into data characteristics and model architecture. The results indicate that the proposed approach is effective for detecting offensive language in LGBTQIA-related discourse and adaptable to multiple social media platforms. Future research will explore multilingual extensions, multimodal content analysis, and real-time deployment for proactive content moderation.

References
  1. K. Š. B. Z. D. a. R.-Š. M. Miok, “To ban or not to ban,” Bayesian attention networks for reliable hate speech detection. Cognitive Computation, pp. 389-406, 2022.
  2. F. Millstein, “Natural language processing with python: natural language processing using NLTK,” 2020. .
  3. G. B. Z. L. S. a. W. A. Raza, “Sentiment Analysis on COVID Tweets: An Experimental Analysis on the Impact of Count Vectorizer and TF-IDF on Sentiment Predictions using Deep Learning Models,” in International Conference on Digital Futures and Transformative Technologies (ICoDT2) (pp. 1-6). IEEE., 2021, May.
  4. T. C. K. C. G. D. J. Mikolov, “Efficient Estimation of Word Representations in,” arXiv, Sep 2013.
  5. K. R. R. &. L. H. Dinakar, “Modeling the detection of online harassment,” in Proceedings of the International AAAI Conference on Web and Social Media, 2012.
  6. S. A. J. &. C. E. F. Sood, “Profanity use in online communities,” in Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, 2012.
  7. N. Z. Z. D. H. &. L. J. Gitari, “A lexicon-based approach for hate speech detection,” International Journal of Multimedia and Ubiquitous Engineering, vol. 10, no. 4, p. 215–230, 2015.
  8. C. L. E. H. V. &. D. W. Van Hee, “Detection and fine-grained classification of aggressive messages in social media,” Journal of Social Network Analysis, vol. 5, no. 2, pp. 123-135, 2015.
  9. P. G. M. a. V. V. Badjatiya, “Stereotypical bias removal for hate speech detection task using knowledge-based generalizations,” in In The World Wide Web Conference , 2019, May.
  10. A. G. E. P. E. a. C. A. Vabalas, “Machine learning algorithm validation with a limited sample size,” PloS One, vol. 14, no. 11, 2019.
  11. B. R. Chakravarthi, “Multilingual hope speech detection in English and Dravidian languages,” International Journal of Data Science and Analytics, vol. 14, no. 4, p. 389–406, 2022.
  12. M. Monteleone, “NooJ grammars and ethical algorithms: tackling on-line hate speech,” in International Conference on Automatic Processing of Natural-Language Electronic Texts with NooJ, 2018, June.
  13. G. B.-O. E. M.-C. J. D. S. D. C. J. Torregrosa, “A survey on extremism analysis using natural language processing,” arXiv preprint, 2021.
  14. A. Z. W. Yin, “Towards generalisable hate speech detection: a review on obstacles and solutions,” PeerJ Computer Science, vol. 7, 2021.
  15. H. R. Y. E. Y. K. R. N. G. O. F. S. MacAvaney, “Hate speech detection: Challenges and solutions,” PloS One, vol. 14, no. 8, 2019.
  16. K. Gelber, “Differentiating hate speech: a systemic discrimination approach,” Critical Review of International Social and Political Philosophy, 2019.
  17. A. P. H. Nyman, “The Harmful Effects of Online and Offline Anti LGBTI Hate Speech,” 2019.
  18. M. S. C. a. M. H. Chaudhary, “Countering online hate speech: An nlp perspective,” arXiv preprint arXiv:2109.02941., 2021.
  19. M. W. A. Schmidt, “A survey on hate speech detection using natural language processing,” in Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, 2017, April.
  20. S. B. S. a. A. M. Biere, “Hate speech detection using natural language processing techniques.,” Master Business AnalyticsDepartment of Mathematics Faculty of Science, 2018.
  21. Z. a. T. A. Al-Makhadmeh, “Automatic hate speech detection using killer natural language processing optimizing ensemble deep learning approach,” Computing, vol. 102, no. 2, pp. 501-522, 2020.
  22. A. a. A. K. Alrehili, “Sentiment Analysis of Customer Reviews Using Ensemble Method,” in International Conference on Computer and Information Sciences (ICCIS), 2019.
  23. M. F. R. a. C. N. Mozafari, “A BERT-based transfer learning approach for hate speech detection in online social media,” International Conference on Complex Networks and Their Applications, pp. 928-940, 2019.
  24. R. G. M. A. J. N. P. a. H. P. Martins, “Hate speech classification in social media using emotional analysis,” in Brazilian Conference on Intelligent Systems (BRACIS), 2018.
  25. A. S. A. B. H. a. V. J. Bisht, “Detection of hate speech and offensive language in twitter data using lstm model,” in Recent trends in image and signal processing in computer vision, Singapore, 2020.
  26. B. S. K. S. M. V. T. a. D. S. Pariyani, “Hate speech detection in twitter using natural language processing,” in Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), 2021, February.
  27. P. William, R. Gade, R. e. Chaudhari, A. B. Pawar and M. A. Jawale, “Machine Learning based Automatic Hate Speech Recognition System,” in International Conference on Sustainable Computing and Data Communication Systems (ICSCDS), 2022.
  28. J. P. G. a. Z. A. Bokstaller, “Model Bias in NLP–Application to Hate Speech Classification using transfer learning techniques,” 2021.
  29. J. C. M.-W. L. K. &. T. K. Devlin, “BERT: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint, 2018.
  30. “Jigsaw Unintended Bias in Toxicity Classification,” 2019.
  31. L. M. a. U. M. Learners, “Language Models are Unsupervised Multitask Learners,” OpenAI, 2019.
  32. “Hate Speech and Offensive Content Identification in Indo-European Languages,” 2019.
  33. T. Mandl, “Overview of the HASOC track at FIRE 2019,” in Proceedings of the 11th Forum for Information Retrieval Evaluation , 2019.
  34. J. Yao, “Automated sentiment analysis of text data with NLTK,” in Physics: Conference Series, 2019, April.
  35. N. a. T. K. Alvi, “Sentiment Analysis of Bengali Text using CountVectorizer with Logistic Regression,” in 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2021, July.
  36. J. L. N. a. M. D. Plisson, “A rule-based approach to word lemmatization.,” in In Proceedings of IS, 2004, October.
  37. T. Perkins, J. HilleRisLambers and M. A. Harsch, “Environmental warming and biodiversity–ecosystem functioning in freshwater microcosms: Partitioning the effects of species identity, richness and metabolism,” Ecology Letters, vol. 13, no. 12, p. 1316–1325, 2010.
  38. J. M. Paul McNamee, “Character N-Gram Tokenization for European Language Text Retrieval,” Information Retrieval , vol. 7, no. 1, pp. 73-97, 2004.
  39. J. L. N. a. M. D. Plisson, “Tokenization,” in In Syntactic Wordclass Tagging, Dordrecht, 2004, October.
  40. D. M. T. a. H. R. Yogish, “Review on natural language processing trends and techniques using NLTK,” in International Conference on Recent Trends in Image Processing and Pattern Recognition, Singapore, 2018, December.
  41. B. B. S. C. W. a. N. L. Tessem, “Journal of location Based services,” Journal of location Based services, vol. 9, no. 4, pp. 254-272, 2015.
Index Terms

Computer Science
Information Sciences

Keywords

Natural Language Processing Data analysis online hate speech LGBTQIA