| International Journal of Computer Applications |
| Foundation of Computer Science (FCS), NY, USA |
| Volume 187 - Number 116 |
| Year of Publication: 2026 |
| Authors: Bertilla Fernandes, Snehalata B. Shirude |
10.5120/ijca037ed09d2626
|
Bertilla Fernandes, Snehalata B. Shirude . Error Analysis of BERT model for Chatbot using various Performance Measures. International Journal of Computer Applications. 187, 116 ( Jun 2026), 18-23. DOI=10.5120/ijca037ed09d2626
Many opportunities for changing how information and computer systems engage with more naturally, accessible way are presented by Conversational Agents (CAs). There can be possibilities in which human expectations can be fallen short of and "failed" by these CAs. BERT, a Google creation, is a notable development in natural language processing (NLP) with impressive results on a variety of tasks including Chatbots. BERT models are designed to help understand the intricate contextual relationships between each word in a statement. The evaluation metrics of the Question Answering task, can assess the factuality of large language models (LLMs). In this study an explanation of the evaluation measures for error analysis with BERT transformer model for conversational agents is provided and also details of the strengths and limitations of using these evaluation measures for chatbots in response generation is given. The impact of six different types of conversational errors was systematically analyzed by us. Work is done on diverse variants of the BERT model and detailed analysis of the evaluation measures for error analysis on a python FAQ dataset which includes the question, answer and context is performed. It was analyzed that BERTSCORE supplements better with human decisions and brings forth better model selection performance compared to present metrics. Finally, the paper concludes with discussion on the strengths and limitations of the various metrics with error analysis for conversational agents.