Skip to content
AI docs · Building with AI

Retrieval-augmented generation (RAG)

Giving an LLM relevant information at query time by retrieving it from your own data, so answers are grounded and current.

What it is

  • RAG retrieves relevant documents for a query and includes them in the prompt so the model answers from them.
  • It is the most common way to make an LLM answer from private, current, or domain-specific knowledge.

How it works

  • Documents are chunked and embedded into a vector store.
  • At query time, the question is embedded, the most relevant chunks are retrieved, and they are added to the prompt.
  • The model answers grounded in those chunks, ideally with citations.

Trade-offs

  • Cheaper and more current than fine-tuning for knowledge, and easier to update.
  • Answer quality depends heavily on retrieval quality: bad retrieval means bad answers.

When to use it

  • When the model needs to answer from your documents, knowledge base, or frequently changing data.
  • When you need citations and the ability to update knowledge without retraining.

Common pitfalls

  • Poor chunking or retrieval that misses the relevant passage.
  • Assuming RAG eliminates hallucination: it reduces but does not remove it.

Related concepts

Retrieval-augmented generation (RAG): explained · SDEN