AI docs · Building with AI
Retrieval-augmented generation (RAG)
Giving an LLM relevant information at query time by retrieving it from your own data, so answers are grounded and current.
What it is
- RAG retrieves relevant documents for a query and includes them in the prompt so the model answers from them.
- It is the most common way to make an LLM answer from private, current, or domain-specific knowledge.
How it works
- Documents are chunked and embedded into a vector store.
- At query time, the question is embedded, the most relevant chunks are retrieved, and they are added to the prompt.
- The model answers grounded in those chunks, ideally with citations.
Trade-offs
- Cheaper and more current than fine-tuning for knowledge, and easier to update.
- Answer quality depends heavily on retrieval quality: bad retrieval means bad answers.
When to use it
- When the model needs to answer from your documents, knowledge base, or frequently changing data.
- When you need citations and the ability to update knowledge without retraining.
Common pitfalls
- Poor chunking or retrieval that misses the relevant passage.
- Assuming RAG eliminates hallucination: it reduces but does not remove it.