RAG (retrieval-augmented generation)

What is RAG (retrieval-augmented generation)?

Alex GrygorievUpdated Jun 5, 20261 min read

Definition

RAG (retrieval-augmented generation) is a technique that improves an LLM by retrieving relevant documents from a knowledge base at query time and adding them to the prompt. The model then answers using that real, current data — reducing hallucinations and letting it cite private or up-to-date information.

Table of contents

RAG (retrieval-augmented generation) is one of the most important patterns in applied AI. It connects an LLM to your own data so it can answer questions about documents, products or processes the model was never trained on.

Why RAG exists

An LLM's knowledge is frozen at training time and it can't see your private documents. Ask it about your company's internal policy and it will either refuse or invent an answer. RAG fixes this by giving the model the right source material at the moment of the question.

How RAG works

Index: your documents are split into chunks and converted into embeddings, stored in a vector database.
Retrieve: when a question comes in, the system finds the most relevant chunks (by semantic similarity, often combined with keyword search).
Augment & generate: those chunks are inserted into the prompt, and the LLM answers using them — ideally with citations.

RAG vs. fine-tuning

Fine-tuning changes the model's weights to teach it a style or skill. RAG leaves the model alone and changes the information it sees. For keeping answers current and factual, RAG is usually cheaper, faster to update and easier to audit — you can see exactly which source produced an answer.

Where RAG is used

Support assistants that answer from your help-center, internal "chat with your docs" tools, and any AI agent that needs grounded, trustworthy knowledge. Good retrieval — hybrid search that blends vector similarity with keyword matching — is what separates a reliable RAG system from a flaky one.

Summary

RAG grounds an LLM in real data: retrieve the right facts, add them to the prompt, generate a grounded answer. It is the standard way to make AI accurate, current and trustworthy on your own content.

Frequently asked questions

Does RAG stop AI from hallucinating?

It greatly reduces hallucinations by grounding answers in retrieved sources, but it does not eliminate them entirely. Quality depends on good retrieval and prompting the model to answer only from the provided context.

Is RAG better than fine-tuning?

For keeping knowledge current and factual, RAG is usually the better first choice — it is cheaper and auditable. Fine-tuning is better for teaching a fixed style or format. Many systems combine both.

Put AI to work in your business

What is RAG (retrieval-augmented generation)?

Why RAG exists

How RAG works

RAG vs. fine-tuning

Where RAG is used

Summary

Frequently asked questions

More from the Wiki-Lexikon

What is a vector database (and embeddings)?

What is an LLM (large language model)?

What is an AI agent?

What is prompt engineering?