| International Journal of Computer Applications |
| Foundation of Computer Science (FCS), NY, USA |
| Volume 187 - Number 56 |
| Year of Publication: 2025 |
| Authors: Ajay Guyyala, Prudhvi Ratna Badri Satya, Vijay Putta, Krishna Teja Areti |
10.5120/ijca2025925964
|
Ajay Guyyala, Prudhvi Ratna Badri Satya, Vijay Putta, Krishna Teja Areti . RAG-based AI Agents for Multilingual Help Desks in Low-Bandwidth Environments. International Journal of Computer Applications. 187, 56 ( Nov 2025), 15-28. DOI=10.5120/ijca2025925964
The increasing demand for multilingual help desk systems has prompted the need for advanced solutions that can provide accurate, real time responses across various languages. This paper presents a retrieval-augmented generation (RAG) based system optimized for low-bandwidth environments. The proposed system integrates retrieval techniques with generative models, enabling it to generate contextually relevant responses while minimizing latency. To address the challenge of low-bandwidth operation, model distillation and token compression methods are introduced, which reduce model size and response time. The system’s performance is evaluated on multilingual datasets, demonstrating substantial improvements over baseline models in terms of accuracy, recall, precision, and F1-Score. The challenges of multilingual support, retrieval accuracy, and low-latency performance are effectively tackled by this approach, making it a viable solution for real-time customer support in resource-constrained settings. The findings suggest that the proposed system can serve as a robust platform for multilingual help desks, offering improved scalability and efficiency. The system was built using a hybrid retriever– generator architecture, with a cross-lingual transformer for retrieval and a transformer-based sequence-to-sequence model for generation. Multilingual datasets, including TyDiQA, mMARCO, XQuAD, MLDoc, and AfriSenti, were used for training and evaluation. Low-bandwidth optimization techniques such as model distillation and token compression were applied.The proposed system achieved higher EM, BLEU, and MRR scores than baseline models, with EM of 79.2%, BLEU of 32.8, and MRR of 0.80, while reducing latency from 3.4s in the baseline to 2.1s. The distilled model further reduced latency to 1.8s with minor performance trade-offs. Error analysis showed reduced hallucination rates and improved relevance in responses for low-resource languages.