International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 187 - Number 25 |
Year of Publication: 2025 |
Authors: Amel Muminovic |
![]() |
Amel Muminovic . Moderating Harm: Benchmarking Large Language Models for Cyberbullying Detection in YouTube Comments. International Journal of Computer Applications. 187, 25 ( Jul 2025), 1-9. DOI=10.5120/ijca2025925403
As online platforms grow, comment sections increasingly host harassment that undermines user experience and well-being. This study benchmarks three state-of-the-art large language models: OpenAI GPT-4.1, Google Gemini 1.5 Pro, and Anthropic Claude 3 Opus, on a corpus of 5 080 YouTube comments drawn from high-abuse videos in gaming, lifestyle, food vlog, and music channels. The dataset comprises 1 334 harmful and 3 746 non-harmful messages in English, Arabic, and Indonesian, annotated independently by two reviewers with almost perfect agreement (Cohen’s κ = 0.83). Each model is evaluated in a strict zero-shot setting with an identical minimal prompt and deterministic decoding, giving a fair multi-language comparison without task-specific tuning. GPT-4.1 achieves the best balance with an F1 score of 0.863, precision of 0.887, and recall of 0.841. Gemini flags the most harmful posts (recall = 0.875) but its precision falls to 0.767 because of frequent false positives. Claude attains the highest precision at 0.920 and the lowest false-positive rate of 0.022, yet its recall drops to 0.720. Qualitative analysis shows that all three models struggle with sarcasm, coded insults, and mixed-language slang. The findings highlight the need for moderation pipelines that combine complementary models, incorporate conversational context, and fine-tune for under-represented languages and implicit abuse. A de-identified version of the dataset, along with the prompts and model outputs, has been made available to support reproducibility and further progress in automated content moderation.