Agent-to-Agent with Spring AI: Two Agents, One Conversation, Zero Magic
I’ve spent a lot of the last few posts on single-agent stuff — RAG, memory, advisors, the whole “one ChatClient does everything” pipeline. That works great until your one agent starts to look like a kitchen drawer: every tool jammed in there, every prompt trying to be all things to all people. At some point you want to break it apart and let specialised agents do specialised things.
That’s what the Agent-to-Agent (A2A) protocol is for. And Spring AI has a starter for it that, honestly, is way less scary than the spec might make you think. So in this post I’ll walk through a tiny demo I built — order-status — that has two agents collaborating to answer one customer question. No frameworks-on-top-of-frameworks, no orchestrator-of-orchestrators, just two Spring Boot processes talking to each other in JSON.
The whole question is:
“Where is my order ORD-1003?”
And the whole answer involves two agents quietly cooperating without the caller ever knowing.
1. Why bother with a protocol at all?
Honest question: if I have two Spring Boot services and I want one to call the other, why not just expose a REST endpoint and call it a day? I do that every week.
The reason A2A is interesting is that it standardises the bits that we always end up reinventing when we string LLMs together:
- Discovery — how do you find out what an agent can do, and on which input/output modes? A2A gives you
GET /.well-known/agent-card.json. - Skills — the unit of work isn’t “an HTTP endpoint”, it’s “a named skill” with a description, examples and tags. The exact things an LLM needs to decide whether to call you.
- Tasks — every call returns a
Taskwith a state (submitted → working → completed/failed) and a list ofArtifacts. Same shape whether you answered in 5ms or streamed for 30s. - Streaming —
message/sendfor fire-and-forget,message/streamfor SSE. Same data model on both sides.
In short: it’s the bare minimum contract two LLM-shaped systems need to talk to each other without coupling. You can argue it’s overkill for two of your own services. You can’t really argue it’s overkill the moment one of those services is provided by someone else.
2. The toy: one customer question, two agents
Here’s what we’re building. Two specialised agents, one client, three JVMs.
order-status agent is both a server and a client at the same time — that's what makes A2A composable.The roles:
| Profile | Role | Port |
|---|---|---|
order-shipping |
A2A server — exposes the track-shipment skill |
9600 |
order-status |
A2A server to its caller and A2A client of shipping-tracker | 9601 |
order-client |
A tiny CLI client — sends one message, prints the answer, exits | random |
The big idea is in row two: the order-status agent is simultaneously a server and a client. It answers the client’s question, and to do that it goes and asks another agent for help. The client never knows there’s a second agent involved. That encapsulation — “I just call one agent, and it figures it out” — is the whole point of A2A composition. You can stack this however deep you need.
3. What Spring AI gives you for free
The Spring AI A2A starter is doing most of the boring HTTP work. Here’s what you actually get:
- Server side: an
A2aControllerthat wires upPOST /a2a(sync) andPOST /a2a/stream(SSE), plusGET /.well-known/agent-card.jsonfor discovery. ATaskStorekeeps track of in-flight and completed tasks. All of this is auto-configured the moment you setspring.ai.a2a.server.enabled=true. - A
SkillHandlerinterface — the only code you have to write to expose a skill. One method:Task handle(Message request, Task seed). That’s the whole strategy interface. - Client side: an
A2aClient(built viaA2aClientBuilder) with four methods:fetchAgentCard,sendMessage,streamMessage,getTask. You point it at a base URL and it does the JSON-RPC marshalling for you. - The protocol records:
AgentCard,AgentSkill,Message,Part(Text / Data / File, sealed),Task,TaskEvent. Immutable Java records, which is exactly what you want when you’re shovelling things across a network boundary.
So the work that’s actually yours is: implement one or more SkillHandler beans, declare an AgentCard bean describing your agent, and (if you’re a client too) ask Spring for an A2aClient. That’s basically it.
4. The downstream agent — shipping-tracker
Let’s start with the easy one. Pure A2A server, no outbound calls, one skill called track-shipment. Whole thing is one config class.
@Configuration(proxyBeanMethods = false)
@Profile("order-shipping")
@ConditionalOnProperty(prefix = "spring.ai.a2a.server", name = "enabled", havingValue = "true")
public class ShippingTrackerConfig {
public static final String SKILL_ID = "track-shipment";
private static final Map<String, String> TRACKING_DB = Map.of(
"1Z999AA10123456784", "In transit — arrived at distribution center in Lyon, France. ETA: tomorrow.",
"FX555000123", "Out for delivery — driver expected between 14:00 and 18:00.",
"DHL7788991122", "Delivered — signed for by 'M. SILVA' at 09:42.",
"USPS940010000000", "Label created — awaiting carrier pickup."
);
@Bean
public AgentCard shippingTrackerAgentCard() {
AgentSkill skill = new AgentSkill(
SKILL_ID,
"Track Shipment",
"Returns the latest carrier status for a tracking number.",
List.of("logistics", "shipping", "tracking"),
List.of("UPS:1Z999AA10123456784", "FX555000123")
);
return AgentCard.text(
"shipping-tracker-agent",
"Looks up real-time shipment status from carrier systems.",
"http://localhost:9600",
"1.0.0",
AgentCapabilities.basic(),
List.of(skill)
);
}
@Bean
public SkillHandler trackShipmentSkill() {
return new SkillHandler() {
@Override public String skillId() { return SKILL_ID; }
@Override
public Task handle(Message request, Task seed) {
String input = Part.texts(request.parts().stream())
.reduce((a, b) -> a + " " + b).orElse("").trim();
String status = lookup(input);
Task.Artifact artifact = new Task.Artifact(
"track-" + seed.id(),
"shipment-status",
List.of(new Part.TextPart(status))
);
return new Task(seed.id(), seed.contextId(),
Task.TaskStatus.of(Task.State.completed),
seed.history(), List.of(artifact), seed.createdAt());
}
};
}
static String lookup(String rawInput) {
// Accept either "CARRIER:NUMBER" or just "NUMBER".
String number = rawInput.contains(":")
? rawInput.substring(rawInput.indexOf(':') + 1).trim()
: rawInput.trim();
String status = TRACKING_DB.get(number);
return status != null ? status
: "No tracking information found for '" + number + "'.";
}
}A few things worth pointing out, because this pattern repeats for every skill you’ll ever write:
- The skill is just a
SkillHandlerbean. Spring AI’s auto-config picks it up and routes incomingmessage/sendcalls to it based onskillId(). No@RequestMapping, no JSON parsing, none of that. - The agent card is a bean too. That bean is what gets served at
/.well-known/agent-card.json. Thedescription,tagsandexamplesfields aren’t decoration — they’re what an LLM-based caller will read to decide whether to send work your way. Treat them as part of the public API. - Output is a
Task.Artifact. An artifact is just a typed bucket ofParts. We’re usingTextParthere because the answer is a string. If we had structured data — JSON, a file, an image — we’d useDataPartorFilePartinstead. Same envelope. - The lookup is deterministic. No LLM call here. That’s deliberate. The point of the demo is to show the protocol, not to throw a model at every problem. In the real world you’d swap
TRACKING_DBfor aRestClientcall to UPS or DHL.
That’s the whole downstream agent. Run it with --spring.profiles.active=order-shipping and you have a working A2A endpoint.
5. The upstream agent — order-status (server and client)
This is the interesting one. Same SkillHandler shape as before, but now the handler does an A2A call to another agent inside its own logic.
First the wiring — agent card + the outbound A2aClient + the skill:
@Configuration(proxyBeanMethods = false)
@Profile("order-status")
@ConditionalOnProperty(prefix = "spring.ai.a2a.server", name = "enabled", havingValue = "true")
@EnableConfigurationProperties(OrderStatusConfig.OrderProps.class)
public class OrderStatusConfig {
public static final String SKILL_ID = "order-status";
static final Map<String, Order> ORDERS = Map.of(
"ORD-1001", new Order("ORD-1001", "Mechanical keyboard", "UPS", "1Z999AA10123456784"),
"ORD-1002", new Order("ORD-1002", "Coffee beans (1kg)", "FedEx", "FX555000123"),
"ORD-1003", new Order("ORD-1003", "Running shoes", "DHL", "DHL7788991122"),
"ORD-1004", new Order("ORD-1004", "USB-C cable", "USPS", "USPS940010000000")
);
@Bean
public AgentCard orderStatusAgentCard() {
AgentSkill skill = new AgentSkill(
SKILL_ID, "Order Status",
"Tells a customer where their order is — combines order data with live shipment tracking.",
List.of("ecommerce", "customer-service", "a2a-to-a2a"),
List.of("ORD-1001", "Where is ORD-1003?")
);
return AgentCard.text("order-status-agent",
"Answers 'where is my order?' by delegating shipment tracking to a downstream A2A agent.",
"http://localhost:9601", "1.0.0",
AgentCapabilities.basic(), List.of(skill));
}
/** The outbound A2A client that points to the shipping-tracker agent. */
@Bean
public A2aClient shippingTrackerClient(OrderProps props) {
return A2aClientBuilder.forUrl(props.shippingUrl()).build();
}
@Bean
public SkillHandler orderStatusSkill(A2aClient shippingTrackerClient) {
return new SkillHandler() {
@Override public String skillId() { return SKILL_ID; }
@Override public Task handle(Message request, Task seed) {
return doLookup(request, seed, shippingTrackerClient);
}
};
}
@ConfigurationProperties(prefix = "order-status")
public record OrderProps(@DefaultValue("http://localhost:9600") String shippingUrl) {}
record Order(String id, String product, String carrier, String trackingNumber) {}
}The thing to notice is that A2aClient is just a regular Spring bean. We injected it into the skill the same way we’d inject a JdbcTemplate. The client knows exactly one URL — that of the downstream agent — and exposes the four A2A methods. Nothing about it is LLM-aware.
Now the actual work. This is the bit that does the A2A hop:
static Task doLookup(Message request, Task seed, A2aClient shippingTracker) {
String userText = Part.texts(request.parts().stream())
.reduce((a, b) -> a + " " + b).orElse("").trim();
String orderId = extractOrderId(userText); // regex: ORD-\d+
Order order = orderId == null ? null : ORDERS.get(orderId);
if (order == null) {
return failed(seed, "Sorry, I couldn't find an order matching '" + userText + "'.");
}
String shipmentStatus;
String downstreamTaskId = null;
try {
// ─── A2A-to-A2A hop: order-status → shipping-tracker ───────────────
String trackingQuery = order.carrier() + ":" + order.trackingNumber();
Task tracked = shippingTracker.sendMessage(Message.userText(trackingQuery));
downstreamTaskId = tracked.id();
shipmentStatus = tracked.artifacts().stream()
.flatMap(a -> Part.texts(a.parts().stream()))
.findFirst()
.orElse("(no status returned by carrier)");
} catch (Exception ex) {
// Downstream is down? Degrade gracefully — never propagate the failure raw.
shipmentStatus = "(live tracking is currently unavailable — please retry shortly)";
}
String customerReply = String.format(
"Order %s — %s (carrier: %s, tracking #%s).%n → %s",
order.id(), order.product(), order.carrier(), order.trackingNumber(), shipmentStatus);
Task.Artifact artifact = new Task.Artifact(
"order-status-" + seed.id(), "order-status-reply",
List.of(
new Part.TextPart(customerReply),
new Part.DataPart(Map.of(
"orderId", order.id(),
"carrier", order.carrier(),
"trackingNumber", order.trackingNumber(),
"downstreamAgent", "shipping-tracker-agent",
"downstreamTaskId", downstreamTaskId == null ? "" : downstreamTaskId
))
)
);
return new Task(seed.id(), seed.contextId(),
Task.TaskStatus.of(Task.State.completed),
seed.history(), List.of(artifact), seed.createdAt());
}There’s a lot in there but most of it is shaping data. The actually-A2A line is this one:
Task tracked = shippingTracker.sendMessage(Message.userText(trackingQuery));That’s it. One method call, fully synchronous, returns a Task. Same shape as what we return. This is what makes the protocol composable — the thing you call back is the same thing your callers expect from you. You can stack as many hops as you like and the data model never changes.
Two more things I want to draw attention to in that handler, because they’re patterns I’d reach for again:
- The
try/catcharound the downstream call is non-negotiable. A2A doesn’t change the rules of distributed systems — the other agent can be down, slow, or just plain wrong. Catch the exception, return a degraded-but-still-completedTask, log the failure with full context. Never let a downstream error bubble up to your caller as a 500. Your skill’s contract is “I always return a Task”; live up to it. - Carry a
DataPartnext to theTextPart. The text part is for the human. The data part is for anything programmatic downstream — observability, audits, chaining into another agent. In the demo I’m stuffing thedownstreamTaskIdinto it, which means a tracing tool can follow the call from the client through both hops just by walking task IDs. That’s free observability and you should always pay the price of the extra few bytes.
6. The flow, step by step
Here’s what actually happens on the wire when the client asks “Where is ORD-1003?”.
Walking through it once:
- The client sends
message/sendto:9601with the user’s question as aTextPart. order-statusregex-extractsORD-1003from the free-form text.- It looks the order up in its in-memory order book (in production: a JPA repo, an OMS REST call, whatever).
- It builds a tracking query like
"DHL:DHL7788991122". - It calls
shippingTracker.sendMessage(...)— a second A2A hop, this time as a client. shipping-trackerruns its owntrack-shipmentskill, returns a completedTaskwith one artifact.order-statusextracts the text from the downstream artifact, composes a customer-friendly reply, packs bothTextPartandDataPartinto one artifact.- Returns a single completed
Taskto the original caller. The client prints the text and exits.
The original caller made one round trip. Two agents collaborated. Nobody saw it but you and the logs.
7. The client (because it’s tiny and beautiful)
For completeness, here’s the entire client. It’s a CommandLineRunner that does discovery, sends one message, prints whatever artifacts come back, exits.
@Component
@Profile("order-client")
public class OrderClientRunner implements CommandLineRunner {
private final A2aClient client;
public OrderClientRunner(A2aClient client) { this.client = client; }
@Override
public void run(String... args) {
String question = args.length > 0 ? String.join(" ", args)
: "Where is my order ORD-1001?";
AgentCard card = client.fetchAgentCard(); // discovery
log.info("Discovered '{}' v{} at {} — skills: {}",
card.name(), card.version(), card.url(),
card.skills().stream().map(AgentSkill::id).toList());
log.info("Asking: \"{}\"", question);
Task task = client.sendMessage(Message.userText(question)); // the call
log.info("Task {} state={}", task.id(), task.status().state());
task.artifacts().forEach(a ->
Part.texts(a.parts().stream()).forEach(line -> log.info(" → {}", line)));
}
}Notice we always call fetchAgentCard() before sendMessage. That’s not just polite — it’s the whole reason discovery exists. In a real client (especially an LLM-driven one) you’d cache the card and use the description, examples and skills to decide whether this is the right agent for the job at hand. Skipping discovery means hard-coding the contract, which kind of defeats the point.
The output looks like this:
Order ORD-1003 — Running shoes (carrier: DHL, tracking #DHL7788991122).
→ Delivered — signed for by 'M. SILVA' at 09:42.
One line of code on the client side. Two agents collaborated to produce it.
8. Things I learned the hard way (so you don’t have to)
Some of these are genuinely A2A-specific, others are just classic distributed-systems lessons that bite extra hard when there’s an LLM somewhere in the loop.
Keep routing in code, not in the LLM. In the demo, the order-status skill programmatically decides to call shipping-tracker. There’s no LLM going “hmm, maybe I should call the other agent”. I cannot stress how much more reliable this is. LLM-driven routing is fine for exploration, terrible for production. If your system has a known-good decision tree, write it in Java. Save the LLM for the bits that genuinely need fuzzy reasoning.
Make every skill idempotent. A2A clients will retry. Your callers will retry. The network will burp. If track-shipment is idempotent — same input → same output, no side effects — none of that matters. The moment you start firing “send the customer an email” or “charge the card” inside a skill, you need an idempotency key, a deduplication store, and probably a human-in-the-loop gate. Easier to design the skill so it doesn’t need any of that.
Budget your hops. This demo has exactly one downstream call, but A2A is composable — agent A can call B which can call C which can call A again, and now you have a loop. Set a max hop depth (track it through contextId or a header), put a wall-clock timeout on every A2aClient call, and fail fast. It’s the same max_iterations pattern you’d use in a single ReAct agent, just at the network boundary.
Trace through the taskId chain. The DataPart trick I showed in section 5 — stashing downstreamTaskId in your reply — is what makes a multi-agent trace navigable after the fact. Your APM probably won’t follow A2A calls automatically (it’s just HTTP to it), so leave breadcrumbs. Even better: propagate a traceparent header through A2aClient so OpenTelemetry can stitch the spans together.
Don’t share state across agents. Each agent in the demo has its own little world: order-status owns the order book, shipping-tracker owns the carrier DB. They never reach into each other’s data. If they ever need to share a fact, it goes over the wire, in a Part, as part of an explicit message. That’s what keeps the agents independently deployable, independently testable, and independently scalable. The moment you’re tempted to share a database between two agents, you’ve made one agent that happens to be split across two JVMs — which is the worst of both worlds.
Per-agent budgets. Even though we’re not doing LLM calls in the demo skills, the BEDROCK_MODEL and Bedrock client are wired in. The minute you add LLM-backed reasoning to one of these skills — say, “summarise the carrier’s vague status into something a human can understand” — give that call its own token budget and timeout. Don’t let one agent’s runaway model invocation eat the whole request budget.
Encapsulate, encapsulate, encapsulate. This one is almost philosophical. The reason the order-client doesn’t know about shipping-tracker is the same reason your OrderService doesn’t know about your PaymentGateway’s connection pool: it’s none of its business. A2A makes this trivial — every agent is a black box behind a single skill name. Lean into that. Don’t let your client code start “knowing” about downstream agents. The moment it does, you’ve leaked an internal detail into your public API and you’ve lost the ability to swap, scale or replace the downstream agent without coordinated changes.
9. Wrap up
What’s nice about this whole exercise is how ordinary it ends up looking. There’s no agent framework. No orchestration DSL. No giant prompt that pretends to coordinate everything. There’s:
- Two
@Configurationclasses. - Two
SkillHandlerbeans. - One
A2aClientbean, injected into a skill. - A handful of A2A protocol records (
Task,Message,Part, …). - Spring AI’s auto-configuration filling in the HTTP plumbing.
The result is a system where you can swap either agent for a totally different implementation — Python, Go, a hosted service, doesn’t matter — as long as it speaks A2A. That portability is the real prize. Multi-agent systems are interesting; multi-agent systems whose components you can casually replace at the protocol boundary are a different conversation entirely.
The full source — including streaming endpoints, integration tests, agent-card discovery, and a working AWS Bedrock wiring you can plug into the skills — is in the order-status repo. Clone it, run the three profiles in three terminals, and ask it where your order is. It’ll tell you.
Next up in this little A2A series I want to actually put an LLM inside one of these skills — let the order-status agent reason about ambiguous user input rather than regex its way through. That’s where it gets fun.