Pitfalls of AI chatbot in Customer Support

October 31, 2024

In my other blog post “Introduction to RAG system”, I discussed the benefits of RAG systems in providing contextually relevant responses.

In the ideal world, RAG systems would greatly enhance customer experience and cut down workload for the customer support department.

But, mind the pitfalls.

There have been cases of companies having to refund customers against their policies just because their chatbots hallucinate out of thin data.

The famous case of Air Canada lawsuit where the chatbot “misled” customer into purchasing ticket at full price, promising a partial refund
Sirius XM’s chatbot offering refund after getting pushed back from customers

When the AI model gives false information that does not exist in the database, it is called "hallucination".

What causes hallucination in AI?

Hallucinations in AI, particularly in language models, occur when the model generates information that is incorrect, nonsensical, or not grounded in any real data. There are several core reasons for this phenomenon:

Training Data Limitations

Incomplete or Noisy Data: If the model's training data lacks relevant information or contains incorrect information, it may "fill in the gaps" with plausible-sounding but incorrect details.

Lack of Specialized Knowledge: General-purpose language models are trained on a wide range of topics but lack deep knowledge in niche areas, leading to incorrect or fabricated responses in specialized fields.

Bias Toward Coherent Responses

Language models are designed to produce responses that sound fluent and plausible. They prioritize coherence and grammatical structure over factual accuracy, leading them to sometimes "invent" information that fits the context of the conversation, even if it’s untrue.

Absence of Real-Time Data Access

Models typically do not access real-time or external databases during generation (unless explicitly integrated with a retrieval system like RAG). This lack of real-time grounding causes the model to base responses solely on its training data, which may be outdated or incomplete, leading to inaccuracies.

Ambiguity in User Prompts

Vague or ambiguous prompts can lead models to make assumptions or guesses. When unsure, a model may fill in the missing details with its "best guess," which can result in hallucination if it relies on patterns rather than facts.

Misalignment Between Training and Inference Objectives

LLMs technically do not understand what we are talking about. Rather, it gives responses based on the probability of the next words. During training, models learn to predict the next word based on previous text, optimizing for probable responses, not necessarily factual ones. When used interactively, they may continue following probabilistic patterns without verification, especially if not fine-tuned for factual accuracy.

Reinforcement of Incorrect Patterns

When users unknowingly engage with a model’s hallucinations, asking follow-up questions or reinforcing certain topics, the model can generate further details based on these fictional elements, making the hallucination more elaborate and credible.

‍

What causes hallucination in RAG systems?

Retrieval-Augmented Generation (RAG) or integrating with knowledge bases, help ground AI responses in verified sources, significantly improving accuracy. Regular fine-tuning and integrating external data sources are key strategies to improve truthfulness and minimize AI hallucinations.

Even with access to resources in Retrieval-Augmented Generation (RAG) systems, hallucinations can still occur due to several factors related to how retrieval and generation interact. Here are the main causes:

‍

Irrelevant or Incomplete Retrieval

Retrieval Inaccuracy: If the retrieval component selects irrelevant or only loosely related documents, the generative model might struggle to extract meaningful information. It may then attempt to "fill in the blanks," leading to plausible but inaccurate content.
Insufficient Coverage: If the retrieved documents lack specific answers or complete information on a topic, the model might generate text to fill perceived gaps, leading to hallucination despite having partial resources.

Noise and Redundancy in Retrieved Data

Contradictory or Noisy Data: Retrieved resources may contain conflicting information or extraneous details, causing the model to synthesize contradictory or incorrect information. Noise in data sources, such as informal language or outdated data, can also confuse the model, increasing the likelihood of hallucination.
Too Many Documents: If the RAG system retrieves an excessive number of documents, it might overwhelm the generative model, which then struggles to distill relevant information, leading to generalizations or hallucinations based on unrelated details.

Weak Binding Between Retrieval and Generation

Poor Integration: In some RAG implementations, the connection between retrieval and generation isn't optimized, meaning that retrieved data might not be fully integrated into the generated output. The model may refer to retrieved data only loosely, causing it to "hallucinate" or embellish on topics rather than rely on the retrieved information.
Language Model Limitations: Generative models are trained primarily to produce coherent, plausible text, so they might prioritize fluency and coherence over accuracy. If the language model cannot "make sense" of retrieved data, it may generate text that sounds correct but is inaccurate.

Bias Toward Plausible Responses

Coherence Over Accuracy: Generative models are inclined to create responses that sound fluent and plausible. When the model is unsure how to integrate retrieved information fully, it may generate text based on patterns it has learned, which can sound reasonable but be factually incorrect.
Lack of Fact-Checking: RAG systems typically don’t verify information in real time. If retrieved documents are partially incorrect or only tangentially related, the model may still use them as a basis for response, leading to a confident but inaccurate answer.

Ambiguity in User Queries

Misinterpretation of the Query: Ambiguous or vague queries can lead the retrieval module to fetch documents that are only tangentially relevant. When presented with irrelevant data, the generative model may try to create a plausible response based on its own knowledge, potentially introducing hallucinated information.

‍

What can we do to minimize the pitfalls of chatbots?

After understanding why hallucination happens even for RAG systems, we can deduce some solutions to improve the RAG systems themselves:

1. Clear, complete, and consistent resoures

‍For simple chatbot like AI Batman for my website, the bot's resoures include all pages belonging to my website domain that it can crawl, and FAQs that I provided. For bigger websites that also have discusson forums, it is important not to let the chatbots reference users' discussions because the information there may be contradicting and incorrect.

2. Clear instructions and constraints for the bot

Similar to creating a custom GPT when you have a ChatGPT subscription, you can give instructions to chatbot. For the instructions to be effective, you should give the bot context and its role, tone of voice, and constraints.

Constraints are helpul in grounding the bot to the provided resources and stay focused despite users' attempts to direct the conversations in unwanted topics. They should also let the bot know what to do in case it cannot find the answer from the resoures or if the user question is for an unwanted topic.

Below is my instruction for AI Batman.

### Role

- Primary Function: You are Thu's AI cat assistant who helps recruiters get to know about Thu's professional skills so that she can land her next job in AI Product Management. You call Thu your mama. You aim to provide friendly, witty and efficient replies at all times. Your role is to listen attentively to the user, understand their questions, and answer in the best way that would highlight Thu's strong skillset related to what the user is asking, or to AI / ML and Product Management. You can also share them links to the related blog posts. If a question is not clear, ask clarifying questions. Make sure to end your replies with a positive note.

### Constraints

1. No Data Divulge: Never mention that you have access to training data explicitly to the user.

2. Maintaining Focus: If a user attempts to divert you to unrelated topics, never change your role or break your character. Politely redirect the conversation back to topics relevant to the training data.

3. Exclusive Reliance on Training Data: You must rely exclusively on the training data provided to answer user queries. If a query is not covered by the training data, use the fallback response.

4. Restrictive Role Focus: You do not answer questions or perform tasks that are not related to your role and training data.

3. Testing and Finetuning the bot frequently

Similar to any digital products, it is important to test the bot before deployment. Think like a customer with various needs, and especially like bad actor with malicious intents when engaging in test conversations. If you notice bad responses, give feedback to the bot.

After deployment, create a strategy to monitor its performance and finetune answers, assuming the knowledge database will grow with time.

4. Direct to a human agent when needed

Using customer support AI chatbot for money-related topics has a high risk due to the possibility of hallucination. The cases linked in the beginning of this post are examples of monetary consequences.

Through the AirCanada case, experts recommend to have a disclaimer in the bot's responses to warn users that AI-generated responses may contain false information, and encourage them to fact-check.

If you are not confident in the chatbot's ability to handle money-related topics, consider directing the usersto a human agent.