CFP last date
20 February 2026
Call for Paper
March Edition
IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper
Know more
Random Articles
Reseach Article

A Comprehensive Literature Review on Deep Learning–Driven Multilingual Chatbots for Low-Resource Languages with a Focus on Marathi–Hindi–English Interaction

by Bharti Borade, Charansing N. Kayte
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Number 77
Year of Publication: 2026
Authors: Bharti Borade, Charansing N. Kayte
10.5120/ijca2026926240

Bharti Borade, Charansing N. Kayte . A Comprehensive Literature Review on Deep Learning–Driven Multilingual Chatbots for Low-Resource Languages with a Focus on Marathi–Hindi–English Interaction. International Journal of Computer Applications. 187, 77 ( Jan 2026), 44-53. DOI=10.5120/ijca2026926240

@article{ 10.5120/ijca2026926240,
author = { Bharti Borade, Charansing N. Kayte },
title = { A Comprehensive Literature Review on Deep Learning–Driven Multilingual Chatbots for Low-Resource Languages with a Focus on Marathi–Hindi–English Interaction },
journal = { International Journal of Computer Applications },
issue_date = { Jan 2026 },
volume = { 187 },
number = { 77 },
month = { Jan },
year = { 2026 },
issn = { 0975-8887 },
pages = { 44-53 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume187/number77/a-comprehensive-literature-review-on-deep-learningdriven-multilingual-chatbots-for-low-resource-languages-with-a-focus-on-marathihindienglish-interaction/ },
doi = { 10.5120/ijca2026926240 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2026-02-01T00:33:39.954168+05:30
%A Bharti Borade
%A Charansing N. Kayte
%T A Comprehensive Literature Review on Deep Learning–Driven Multilingual Chatbots for Low-Resource Languages with a Focus on Marathi–Hindi–English Interaction
%J International Journal of Computer Applications
%@ 0975-8887
%V 187
%N 77
%P 44-53
%D 2026
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Conversational Artificial Intelligence (AI) has undergone substantial progress, evolving from rule-based systems to advanced transformer-driven multilingual models. However, research for low-resource Indian languages—particularly Marathi and Hindi—remains limited despite rapid technological advances. This review synthesizes studies from 2000 to 2025, covering rule-based chatbots, retrieval methods, Seq2Seq architectures, multilingual transformers, and self-supervised speech models such as wav2vec 2.0 and HuBERT. The analysis highlights key linguistic challenges, including agglutination, free word order, transliteration, regional accents, and pervasive code-mixing. Although models like mBERT, XLM-R, and MuRIL significantly improve multilingual understanding, they still struggle with hybrid inputs and domain-specific conversational tasks. Persistent gaps include limited datasets, weak ASR–NLU integration, and insufficient cultural grounding. The review outlines future directions for developing robust, culturally aligned Marathi–Hindi–English chatbots.

References
  1. Abbet, Christian, et al. “Churn Intent Detection in Multilingual Chatbot Conversations and Social Media.” Proceedings of the 22nd Conference on Computational Natural Language Learning (CoNLL 2018), Association for Computational Linguistics, 2018, pp.161–170.
  2. Adamopoulou, Eleni, and Lefteris Moussiades. “Chatbots: History, Technology, and Applications.” Machine Learning with Applications, vol. 2, 2020, article 100006.
  3. Baevski, Alexei, et al. “wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations.” Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 12449–12460.
  4. Bali, Kalika, et al. “Code-Mixing: The New Normal in India.” Proceedings of the 11th International Conference on Natural Language Processing (ICON 2014), 2014.
  5. Brown, Tom B., et al. “Language Models Are Few-Shot Learners.” Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 1877–1901.
  6. Caldarini, Guendalina, Sardar Jaf, and Kenneth McGarry. “A Literature Survey of Recent Advances in Chatbots.” Information, vol. 13, no. 1, 2022, article 41.
  7. Cassell, Justine, et al. “Embodied Conversational Agents: Representation and Intelligence in User Interfaces.” AI Magazine, vol. 22, no. 4, 2000, pp. 67–83.
  8. Chakravarthi, Bharathi Raja, et al. “DravidianCodeMix: Sentiment Analysis and Offensive Language Identification Dataset for Dravidian Languages in Code-Mixed Text.” Language Resources and Evaluation, vol. 57, 2023, pp. 367–403.
  9. Conneau, Alexis, et al. “Unsupervised Cross-Lingual Representation Learning at Scale.” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), 2020, pp. 8440–8451.
  10. Conneau, Alexis, et al. “XLS-R: Self-Supervised Cross-Lingual Speech Representation Learning at Scale.” Proceedings of Interspeech 2021, 2021, pp. 2278–2282.
  11. Deriu, Jan, et al. “Survey on Evaluation Methods for Dialogue Systems.” Artificial Intelligence Review, vol. 54, 2021, pp. 755–810.
  12. Devlin, Jacob, et al. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” Proceedings of NAACL-HLT 2019, 2019, pp. 4171–4186.
  13. Fryer, Luke K., and R. Carpenter. “Bots as Language Learning Tools.” Computer Assisted Language Learning, vol. 19, no. 4–5, 2006, pp. 465–477.
  14. Gambäck, Björn, and Amitava Das. “On Measuring the Complexity of Code-Mixing.” Proceedings of the 11th International Conference on Natural Language Processing (ICON 2014), 1st Workshop on Language Technologies for Indian Social Media, 2014, pp. 1–7.
  15. Guerreiro, Miguel Porfirio, et al. “Conversational Agents for Health and Well-Being Across the Life Course: Systematic Review.” JMIR Medical Informatics, vol. 9, no. 9, 2021, e26680.
  16. Hsu, Wei-Ning, et al. “HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units.” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, 2021, pp. 3451–3460.
  17. Khanuja, Simran, et al. “MuRIL: Multilingual Representations for Indian Languages.” arXiv preprint arXiv:2103.10730, 2021.
  18. Kuhail, Mohammad Amin, et al. “Interacting with Educational Chatbots: A Systematic Review.” Education and Information Technologies, vol. 28, 2023, pp. 973–1018.
  19. Kunchukuttan, Anoop, et al. “IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-Trained Multilingual Language Models for Indian Languages.” Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, pp. 4948–4961.
  20. Labadze, Levan, et al. “Role of AI Chatbots in Education: Systematic Literature Review.” International Journal of Educational Technology in Higher Education, vol. 20, no. 1, 2023, article 32.
  21. Laranjo, Liliana, et al. “Conversational Agents in Healthcare: A Systematic Review.” Journal of the American Medical Informatics Association, vol. 25, no. 9, 2018, pp. 1248–1258.
  22. Li, Jiwei, et al. “A Diversity-Promoting Objective Function for Neural Conversation Models.” Proceedings of NAACL-HLT 2016, 2016, pp. 110–119.
  23. Montenegro, Jorge Luis Z., et al. “Survey of Conversational Agents in Health.” Expert Systems with Applications, vol. 129, 2019, pp. 56–67.
  24. Okonkwo, Chinedu Wilfred, and Adenike Ade-Ibijola. “Chatbots Applications in Education: A Systematic Review.” Computers and Education: Artificial Intelligence, vol. 2, 2021, article 100033.
  25. Park, Min Sook, et al. “A Survey of Conversational Agents and Their Applications for Self-Management of Chronic Conditions.” Proceedings of the IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC 2023), 2023, pp. 1064–1075.
  26. Philip, Jithin, et al. “Revisiting Low Resource Status of Indian Languages in Machine Translation.” arXiv preprint arXiv:2008.04860, 2020.
  27. Radford, Alec, et al. “Language Models Are Unsupervised Multitask Learners.” OpenAI Technical Report, 2019.
  28. Serban, Iulian Vlad, et al. “Building End-to-End Dialogue Systems Using Generative Hierarchical Neural Network Models.” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, 2016.
  29. Shang, Lifeng, Zhengdong Lu, and Hang Li. “Neural Responding Machine for Short-Text Conversation.” Proceedings of ACL-IJCNLP 2015, 2015, pp. 1577–1586.
  30. Singh, Sonali Uttam, and Akbar Siami Namin. “A Survey on Chatbots and Large Language Models: Testing and Evaluation Techniques.” Natural Language Processing Journal, vol. 10, 2025, article 100128.
  31. Sitaram, Sunayana, et al. “A Survey of Code-Mixed Speech and Natural Language Processing.” ACM Computing Surveys, vol. 55, no. 2, 2023, pp. 1–38.
  32. Smutný, Petr, and Petra Schreiberová. “Chatbots for Learning: A Review of Educational Chatbots for the Facebook Messenger.” Computers & Education, vol. 151, 2020, article 103862.
  33. Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. “Sequence to Sequence Learning with Neural Networks.” Advances in Neural Information Processing Systems, vol. 27, 2014, pp. 3104–3112.
  34. Thara, S., and K. Poornachandran. “Code-Mixing: A Brief Survey.” Proceedings of the 2nd International Conference on Recent Trends in Advanced Computing (ICRTAC), 2019.
  35. Tudor Car, Lorainne, et al. “Conversational Agents in Health Care: Scoping Review and Conceptual Analysis.” Journal of Medical Internet Research, vol. 22, no. 8, 2020, e17158.
  36. Vaswani, Ashish, et al. “Attention Is All You Need.” Advances in Neural Information Processing Systems, vol. 30, 2017, pp. 5998–6008.
  37. Verma, Tanya, et al. “ASR for Low Resource and Multilingual Noisy Code-Mixed Speech.” Proceedings of Interspeech 2023, 2023, pp. 4269–4273.
  38. Vinyals, Oriol, and Quoc V. Le. “A Neural Conversational Model.” Proceedings of the 32nd International Conference on Machine Learning, Deep Learning Workshop, 2015.
  39. Wallace, Richard S. “The Anatomy of A.L.I.C.E.” Parsing the Turing Test: Philosophical and Methodological Issues in the Quest for the Thinking Computer, edited by Robert Epstein et al., Springer, 2009, pp. 181–210.
  40. Zhou, Li, et al. “The Design and Implementation of XiaoIce, an Empathetic Social Chatbot.” Computational Linguistics, vol. 46, no. 1, 2020, pp. 53–93.
  41. Kunchukuttan, Anoop, et al. “IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-Trained Multilingual Language Models for Indian Languages.” Findings of the Association for Computational Linguistics: EMNLP 2020, 2020, pp. 4948–4961.
  42. Chakravarthi, Bharathi Raja, et al. “DravidianCodeMix: Sentiment Analysis and Offensive Language Identification Dataset for Dravidian Languages in Code-Mixed Text.” Language Resources and Evaluation, vol. 57, 2023, pp. 367–403.
  43. Gambäck, Björn, and Amitava Das. “On Measuring the Complexity of Code-Mixing.” Proceedings of the 11th International Conference on Natural Language Processing (ICON 2014), 2014, pp. 1–7.
  44. Conneau, Alexis, et al. “Unsupervised Cross-Lingual Representation Learning at Scale.” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), 2020, pp. 8440–8451.
  45. Khanuja, Simran, et al. “MuRIL: Multilingual Representations for Indian Languages.” arXiv preprint arXiv:2103.10730, 2021.
  46. Devlin, Jacob, et al. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” Proceedings of NAACL-HLT 2019, 2019, pp. 4171–4186.
  47. Sitaram, Sunayana, et al. “A Survey of Code-Mixed Speech and Natural Language Processing.” ACM Computing Surveys, vol. 55, no. 2, 2023, pp. 1–38.
  48. Bali, Kalika, et al. “Code-Mixing: The New Normal in India.” Proceedings of the 11th International Conference on Natural Language Processing (ICON 2014), 2014.
  49. Kunchukuttan, Anoop, et al. “IndicNLP Suite: Monolingual Corpora, Evaluation Benchmarks and Pre-Trained Models for Indian Languages.” Findings of EMNLP 2020, 2020, pp. 4948–4961.
  50. Baevski, Alexei, et al. “wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations.” Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 12449–12460.
  51. Hsu, Wei-Ning, et al. “HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units.” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, 2021, pp. 3451–3460.
  52. Conneau, Alexis, et al. “XLS-R: Self-Supervised Cross-Lingual Speech Representation Learning at Scale.” Proceedings of Interspeech 2021, 2021, pp. 2278–2282.
  53. Sutskever, Ilya, et al. “Sequence to Sequence Learning with Neural Networks.” Advances in Neural Information Processing Systems, vol. 27, 2014, pp. 3104–3112.
Index Terms

Computer Science
Information Sciences

Keywords

Multilingual Chatbots; Marathi–Hindi–English NLP; Transformer Models; mBERT; XLM-R; MuRIL; Seq2Seq; wav2vec 2.0; HuBERT; Low-Resource Languages; Code-Mixing; Conversational AI; Speech Recognition; Natural Language Understanding (NLU); Deep Learning; Indian Languages