What is RAG?

Question

What is RAG?

Nicola Lazzari · Accepted Answer

Retrieval-Augmented Generation (RAG) combines two building blocks: a search step that pulls in trusted information and a generation step that writes the response. Instead of asking an LLM to invent an answer from memory, you give it the exact context it needs. How it works Retrieve. Query a vector database or search index to find the most relevant documents, transcripts, or knowledge base entries. Augment. Package the best snippets into a compact prompt—often with citations or metadata. Generate. Ask the model to answer using only the supplied context, discouraging speculation. Why teams use RAG Grounded answers. Responses reference your content, so accuracy and tone stay on brand. Fresh knowledge. You can update the index without retraining the model. Smaller prompts. Retrieval keeps prompts focused, lowering token cost. When it’s the right fit You have evolving documentation (support portals, policy manuals, research notes). You need provenance—every response should show where it came from. Your content mix is structured enough to index (markdown, HTML, transcripts, PDFs). Use RAG when “check the docs” is part of your team’s workflow. It gives operators and customers fast answers while keeping the model firmly tethered to verified knowledge.

What is RAG?

How it works

Why teams use RAG

When it’s the right fit

Related Resources

Related Articles & Guides

Want to go deeper?