RAG (retrieval-augmented generation)
What is RAG (retrieval-augmented generation)?
Definition
RAG (retrieval-augmented generation) is a technique that improves an LLM by retrieving relevant documents from a knowledge base at query time and adding them to the prompt. The model then answers using that real, current data — reducing hallucinations and letting it cite private or up-to-date information.
Table of contents
RAG (retrieval-augmented generation) is one of the most important patterns in applied AI. It connects an LLM to your own data so it can answer questions about documents, products or processes the model was never trained on.
Why RAG exists
An LLM's knowledge is frozen at training time and it can't see your private documents. Ask it about your company's internal policy and it will either refuse or invent an answer. RAG fixes this by giving the model the right source material at the moment of the question.
How RAG works
- Index: your documents are split into chunks and converted into embeddings, stored in a vector database.
- Retrieve: when a question comes in, the system finds the most relevant chunks (by semantic similarity, often combined with keyword search).
- Augment & generate: those chunks are inserted into the prompt, and the LLM answers using them — ideally with citations.
RAG vs. fine-tuning
Fine-tuning changes the model's weights to teach it a style or skill. RAG leaves the model alone and changes the information it sees. For keeping answers current and factual, RAG is usually cheaper, faster to update and easier to audit — you can see exactly which source produced an answer.
Where RAG is used
Support assistants that answer from your help-center, internal "chat with your docs" tools, and any AI agent that needs grounded, trustworthy knowledge. Good retrieval — hybrid search that blends vector similarity with keyword matching — is what separates a reliable RAG system from a flaky one.
Summary
RAG grounds an LLM in real data: retrieve the right facts, add them to the prompt, generate a grounded answer. It is the standard way to make AI accurate, current and trustworthy on your own content.
Frequently asked questions
Does RAG stop AI from hallucinating?
It greatly reduces hallucinations by grounding answers in retrieved sources, but it does not eliminate them entirely. Quality depends on good retrieval and prompting the model to answer only from the provided context.
Is RAG better than fine-tuning?
For keeping knowledge current and factual, RAG is usually the better first choice — it is cheaper and auditable. Fine-tuning is better for teaching a fixed style or format. Many systems combine both.
More from the Wiki-Lexikon
What is a vector database (and embeddings)?
A vector database stores text as embeddings — numeric vectors of meaning — so AI can search by similarity, not just keywords. Definition, how embeddings work and why vector search powers RAG.
What is an LLM (large language model)?
A large language model (LLM) is an AI trained on huge amounts of text to predict and generate language. Definition, how it works, tokens, context window and where the limits are.
What is an AI agent?
An AI agent is software that uses a language model to plan and act toward a goal — calling tools, making decisions and running multi-step tasks autonomously. Definition, how it works and examples.
What is prompt engineering?
Prompt engineering is the craft of writing instructions that get reliable, accurate results from an LLM. Definition, core techniques (context, examples, structure) and why it matters in production.