Introduction to RAG system (Retrieval-Augmented Generation)

October 28, 2024

Image caption: The image above is a slide from my AI Product Management course with ELVTR. It illustrates the high level architecture of a RAG system where a prompt is input in the embedding model, and the embedding model will find relevant information from the vector database which is fueled by documents or other resources. An answer / output is then generated using an LLM.

‍

Have you ever wondered how some companies' customer service chatbots are very smart and can execute some requests?

And no, it's not because someone is chatting with you live that I'm referring to. There are actually chatbots that understand conversations, give company-relevant information and execute tasks.

To do that, a company typically leverages a LLM (Large Language Model), a RAG system (Retrieval-Augmented Generation), and automation.

In fact, the chatbot Batman on my website is a LLM and a basic RAG system. You can ask it anything about me, and Batman bot will generate a response based on the information on my website, and on the FAQs I provided in the backend.

In this article, we will explore RAG.

‍

What is a RAG system?

A RAG system marries two AI approaches: retrieval-based and generative. In a nutshell:

Retrieval: The system first searches a large corpus of documents (often databases or knowledge bases) to find relevant information based on the query.
Generation: Then, it uses a generative model (like GPT) to process and rephrase this information, integrating it into a cohesive and contextually relevant response.

For example, on an airline website, if a customer asks about their booking, a RAG system will first authenticate the user, retrieve relevant information about their booking, then generate a response in a natural conversational format.

‍

What is RAG used for?

Retrieval-Augmented Generation (RAG) systems have several key applications, primarily in areas that require highly accurate, contextually relevant, and up-to-date responses. Here are some of the main applications of RAG systems:

Customer Support and Service

RAG systems improve customer support by providing instant, accurate answers based on a company’s knowledge base or previous customer interactions. The system retrieves relevant documents and crafts responses that address customer inquiries directly, reducing wait times and improving user satisfaction.

However, when implementing a RAG system for Customer Support, we need to take extra caution due to the possibility of the AI hallucinating.

Healthcare and Medical Information

In healthcare, RAG systems can assist in providing up-to-date medical information to professionals and patients by retrieving relevant research, studies, and case reports. They can offer summaries on treatments, symptoms, or drug interactions, improving the speed and accuracy of information without replacing medical advice.

Legal and Compliance Support

Law firms and compliance departments use RAG systems to quickly retrieve relevant case laws, regulations, or compliance documentation, generating tailored responses that assist legal professionals with research or document preparation. This allows them to efficiently handle complex legal queries.

Financial Services and Market Research

Financial analysts benefit from RAG systems by getting real-time insights from market reports, financial news, and historical data. The system retrieves and summarizes relevant documents, aiding in market analysis, risk assessment, and investment strategy with information that’s both precise and current.

Enterprise Knowledge Management

RAG systems can streamline access to internal company knowledge, policies, or archived documents. By retrieving specific internal data, these systems help employees find relevant information quickly, leading to increased productivity and better-informed decision-making.

An out-of-the-box basic RAG system is Gemini for Google Workspace. The answers that Gemini gives will also depend on the documents that the user has access to. However, at the current time, there are significant drawbacks to this RAG system, which makes the answers unreliable. Read more about those drawbacks in my Case Study: My suggestions for Gemini for Google Workspace - June 28, 2024

Educational Tools and Research Assistance

RAG models serve as educational tools by retrieving academic papers, textbooks, or lecture notes to generate comprehensive summaries and answer student questions. Researchers also use RAG systems for literature reviews, accessing a broad spectrum of information and generating insights more efficiently.

Document Summarization and Report Generation

Businesses and governments use RAG systems to automate the summarization of long reports or databases, creating accessible summaries or actionable insights. This is especially valuable in industries with extensive data, such as government reports, environmental studies, or corporate strategy analysis.

‍

Why use RAG?

Enhanced Accuracy: RAG systems access up-to-date information, making responses more reliable for dynamic topics.

Cost-Effectiveness: By limiting the generative model’s scope to retrieved documents, it reduces the compute intensity, optimizing response times.

Relevancy and Rich Context: RAG blends the specificity of retrieved facts with the adaptability of generative language models, producing tailored, insightful answers.

‍

Key Takeaways

Theoretically, RAG systems are useful in areas that require high accuracy and contextual relevancy in responses. They excel at providing relevant content and can benefit both companies and users.

However, due to the possibility of hallucination, we need to take extra caution especially when the bot is public facing and can cause financial impacts.