RAG with Spring AI

A hands-on series building Retrieval-Augmented Generation (RAG) systems with Spring AI. Each post maps to a runnable demo in the rag-spring-ai project — from a minimal retrieval pipeline to production patterns like multi-format ingestion, conversational memory, advisor composition, structured output, and function calling. Everything runs locally with Ollama and Docker Compose.

Metadata Filtering with Spring AI: A WHERE Clause for Your Vector Store

May 06, 2026

In the last post we fixed context pollution by giving every document domain its own VectorStore. One bucket for FAQ, one for legal, one for tech, one for HR — done. The router picks one and the LLM gets a...

Multi-Document RAG with Spring AI: Multiple Collections, Smart Routing, and Cleaner Top-K

May 04, 2026

Every demo in this series so far has lived inside one cosy little vector store. We dumped a CloudFlow FAQ into it, asked some questions, and got nice answers. That’s also pretty much how every “your first RAG app” tutorial...

Function Calling in Spring AI: Letting the LLM Press the Buttons

May 01, 2026

So far in this series the LLM has been a very polite librarian — we ask it questions, it goes to the vector store, it reads us a nicely worded answer. That’s RAG. It’s great. It’s also, eventually, not enough....

Structured Output in Spring AI: Turning LLM Prose into Typed Java Records

April 28, 2026

Up to now, every demo in this series has happily returned a String from the LLM and called it a day. That’s fine when you’re building a chatbot — humans are great at reading prose. It is not fine when...

Advisors in Spring AI: Composing RAG, Memory, and Safety as a Pipeline

April 25, 2026

So far in this series we’ve quietly been using a feature without ever really stopping to look at it. Every demo — basic RAG, ingestion, vector store ops, chat memory — has been built around ChatClient and these little things...

Chat with Memory in Spring AI: Conversational RAG That Actually Remembers

April 21, 2026

So far in this series we’ve built a basic RAG pipeline, loaded a few different document formats, and poked at the vector store directly to understand what retrieval actually returns.

Vector Store Operations with Spring AI: Similarity Search, Thresholds, and Embedding Inspection

April 19, 2026

In the first post we built a basic RAG pipeline, and in the second post we explored different ways to ingest documents. Both times, we let QuestionAnswerAdvisor handle the retrieval for us — it searched the vector store, grabbed the...

Document Ingestion with Spring AI: Loading Text, JSON, and Custom Chunks into Your RAG Pipeline

April 15, 2026

In the first post we built a basic RAG system — one text file, default chunking, done. It worked great for a quick demo, but real-world documents don’t come in neat .txt files. You’ll deal with JSON exports, PDFs, maybe...

Basic RAG with Spring AI: Build a Grounded Q&A System from Scratch

April 11, 2026

Large language models are impressive, but they have a fundamental limitation: they can only work with what they learned during training. Ask about your company’s internal docs, last week’s release notes, or anything after the training cutoff — and the...