Image caption: The image above is a slide from my AI Product Management course with ELVTR. It illustrates the high level architecture of a RAG system where a prompt is input in the embedding model, and the embedding model will find relevant information from the vector database which is fueled by documents or other resources. An answer / output is then generated using an LLM.
Have you ever wondered how some companies' customer service chatbots are very smart and can execute some requests?
And no, it's not because someone is chatting with you live that I'm referring to. There are actually chatbots that understand conversations, give company-relevant information and execute tasks.
To do that, a company typically leverages a LLM (Large Language Model), a RAG system (Retrieval-Augmented Generation), and automation.
In fact, the chatbot Batman on my website is a LLM and a basic RAG system. You can ask it anything about me, and Batman bot will generate a response based on the information on my website, and on the FAQs I provided in the backend.
In this article, we will explore RAG.
A RAG system marries two AI approaches: retrieval-based and generative. In a nutshell:
For example, on an airline website, if a customer asks about their booking, a RAG system will first authenticate the user, retrieve relevant information about their booking, then generate a response in a natural conversational format.
Retrieval-Augmented Generation (RAG) systems have several key applications, primarily in areas that require highly accurate, contextually relevant, and up-to-date responses. Here are some of the main applications of RAG systems:
RAG systems improve customer support by providing instant, accurate answers based on a company’s knowledge base or previous customer interactions. The system retrieves relevant documents and crafts responses that address customer inquiries directly, reducing wait times and improving user satisfaction.
However, when implementing a RAG system for Customer Support, we need to take extra caution due to the possibility of the AI hallucinating.
In healthcare, RAG systems can assist in providing up-to-date medical information to professionals and patients by retrieving relevant research, studies, and case reports. They can offer summaries on treatments, symptoms, or drug interactions, improving the speed and accuracy of information without replacing medical advice.
Law firms and compliance departments use RAG systems to quickly retrieve relevant case laws, regulations, or compliance documentation, generating tailored responses that assist legal professionals with research or document preparation. This allows them to efficiently handle complex legal queries.
Financial analysts benefit from RAG systems by getting real-time insights from market reports, financial news, and historical data. The system retrieves and summarizes relevant documents, aiding in market analysis, risk assessment, and investment strategy with information that’s both precise and current.
RAG systems can streamline access to internal company knowledge, policies, or archived documents. By retrieving specific internal data, these systems help employees find relevant information quickly, leading to increased productivity and better-informed decision-making.
An out-of-the-box basic RAG system is Gemini for Google Workspace. The answers that Gemini gives will also depend on the documents that the user has access to. However, at the current time, there are significant drawbacks to this RAG system, which makes the answers unreliable. Read more about those drawbacks in my Case Study: My suggestions for Gemini for Google Workspace - June 28, 2024
RAG models serve as educational tools by retrieving academic papers, textbooks, or lecture notes to generate comprehensive summaries and answer student questions. Researchers also use RAG systems for literature reviews, accessing a broad spectrum of information and generating insights more efficiently.
Businesses and governments use RAG systems to automate the summarization of long reports or databases, creating accessible summaries or actionable insights. This is especially valuable in industries with extensive data, such as government reports, environmental studies, or corporate strategy analysis.
Theoretically, RAG systems are useful in areas that require high accuracy and contextual relevancy in responses. They excel at providing relevant content and can benefit both companies and users.
However, due to the possibility of hallucination, we need to take extra caution especially when the bot is public facing and can cause financial impacts.