AI docs · Building with AI

Retrieval-augmented generation (RAG)

Giving an LLM relevant information at query time by retrieving it from your own data, so answers are grounded and current.

What it is

RAG retrieves relevant documents for a query and includes them in the prompt so the model answers from them.
It is the most common way to make an LLM answer from private, current, or domain-specific knowledge.

Documents are chunked and embedded into a vector store.
At query time, the question is embedded, the most relevant chunks are retrieved, and they are added to the prompt.
The model answers grounded in those chunks, ideally with citations.

Step 1 of 6

It starts with a question from the user.

Step through the flow, or let it play. Retrieval and augmentation are what make it RAG.

Cheaper and more current than fine-tuning for knowledge, and easier to update.
Answer quality depends heavily on retrieval quality: bad retrieval means bad answers.

When the model needs to answer from your documents, knowledge base, or frequently changing data.
When you need citations and the ability to update knowledge without retraining.

Quick check

Which of these are real benefits of RAG? Select all that apply.

Select all that apply

Let's get to work

Beyond the explainer, we design, secure, build and run production AI. Tell us what you have in mind.