Agent-to-Agent with Spring AI: Two Agents, One Conversation, Zero Magic

I’ve spent a lot of the last few posts on single-agent stuff — RAG, memory, advisors, the whole “one ChatClient does everything” pipeline. That works great until your one agent starts to look like a kitchen drawer: every tool jammed in there, every prompt trying to be all things to all people. At some point you want to break it apart and let specialised agents do specialised things.

That’s what the Agent-to-Agent (A2A) protocol is for. And Spring AI has a starter for it that, honestly, is way less scary than the spec might make you think. So in this post I’ll walk through a tiny demo I built — order-status — that has two agents collaborating to answer one customer question. No frameworks-on-top-of-frameworks, no orchestrator-of-orchestrators, just two Spring Boot processes talking to each other in JSON.

The whole question is:

“Where is my order ORD-1003?”

And the whole answer involves two agents quietly cooperating without the caller ever knowing.

1. Why bother with a protocol at all?

Honest question: if I have two Spring Boot services and I want one to call the other, why not just expose a REST endpoint and call it a day? I do that every week.

The reason A2A is interesting is that it standardises the bits that we always end up reinventing when we string LLMs together:

Discovery — how do you find out what an agent can do, and on which input/output modes? A2A gives you GET /.well-known/agent-card.json.
Skills — the unit of work isn’t “an HTTP endpoint”, it’s “a named skill” with a description, examples and tags. The exact things an LLM needs to decide whether to call you.
Tasks — every call returns a Task with a state (submitted → working → completed/failed) and a list of Artifacts. Same shape whether you answered in 5ms or streamed for 30s.
Streaming — message/send for fire-and-forget, message/stream for SSE. Same data model on both sides.

In short: it’s the bare minimum contract two LLM-shaped systems need to talk to each other without coupling. You can argue it’s overkill for two of your own services. You can’t really argue it’s overkill the moment one of those services is provided by someone else.

2. The toy: one customer question, two agents

Here’s what we’re building. Two specialised agents, one client, three JVMs.

Figure: The order-status agent is both a server and a client at the same time — that's what makes A2A composable.

The roles:

Profile	Role	Port
`order-shipping`	A2A server — exposes the `track-shipment` skill	9600
`order-status`	A2A server to its caller and A2A client of shipping-tracker	9601
`order-client`	A tiny CLI client — sends one message, prints the answer, exits	random

The big idea is in row two: the order-status agent is simultaneously a server and a client. It answers the client’s question, and to do that it goes and asks another agent for help. The client never knows there’s a second agent involved. That encapsulation — “I just call one agent, and it figures it out” — is the whole point of A2A composition. You can stack this however deep you need.

3. What Spring AI gives you for free

The Spring AI A2A starter is doing most of the boring HTTP work. Here’s what you actually get:

Server side: an A2aController that wires up POST /a2a (sync) and POST /a2a/stream (SSE), plus GET /.well-known/agent-card.json for discovery. A TaskStore keeps track of in-flight and completed tasks. All of this is auto-configured the moment you set spring.ai.a2a.server.enabled=true.
A SkillHandler interface — the only code you have to write to expose a skill. One method: Task handle(Message request, Task seed). That’s the whole strategy interface.
Client side: an A2aClient (built via A2aClientBuilder) with four methods: fetchAgentCard, sendMessage, streamMessage, getTask. You point it at a base URL and it does the JSON-RPC marshalling for you.
The protocol records: AgentCard, AgentSkill, Message, Part (Text / Data / File, sealed), Task, TaskEvent. Immutable Java records, which is exactly what you want when you’re shovelling things across a network boundary.

So the work that’s actually yours is: implement one or more SkillHandler beans, declare an AgentCard bean describing your agent, and (if you’re a client too) ask Spring for an A2aClient. That’s basically it.

4. The downstream agent — `shipping-tracker`

Let’s start with the easy one. Pure A2A server, no outbound calls, one skill called track-shipment. Whole thing is one config class.

@Configuration(proxyBeanMethods = false)
@Profile("order-shipping")
@ConditionalOnProperty(prefix = "spring.ai.a2a.server", name = "enabled", havingValue = "true")
public class ShippingTrackerConfig {

    public static final String SKILL_ID = "track-shipment";

    private static final Map<String, String> TRACKING_DB = Map.of(
            "1Z999AA10123456784", "In transit — arrived at distribution center in Lyon, France. ETA: tomorrow.",
            "FX555000123",        "Out for delivery — driver expected between 14:00 and 18:00.",
            "DHL7788991122",      "Delivered — signed for by 'M. SILVA' at 09:42.",
            "USPS940010000000",   "Label created — awaiting carrier pickup."
    );

    @Bean
    public AgentCard shippingTrackerAgentCard() {
        AgentSkill skill = new AgentSkill(
                SKILL_ID,
                "Track Shipment",
                "Returns the latest carrier status for a tracking number.",
                List.of("logistics", "shipping", "tracking"),
                List.of("UPS:1Z999AA10123456784", "FX555000123")
        );
        return AgentCard.text(
                "shipping-tracker-agent",
                "Looks up real-time shipment status from carrier systems.",
                "http://localhost:9600",
                "1.0.0",
                AgentCapabilities.basic(),
                List.of(skill)
        );
    }

    @Bean
    public SkillHandler trackShipmentSkill() {
        return new SkillHandler() {
            @Override public String skillId() { return SKILL_ID; }

            @Override
            public Task handle(Message request, Task seed) {
                String input = Part.texts(request.parts().stream())
                        .reduce((a, b) -> a + " " + b).orElse("").trim();
                String status = lookup(input);

                Task.Artifact artifact = new Task.Artifact(
                        "track-" + seed.id(),
                        "shipment-status",
                        List.of(new Part.TextPart(status))
                );
                return new Task(seed.id(), seed.contextId(),
                        Task.TaskStatus.of(Task.State.completed),
                        seed.history(), List.of(artifact), seed.createdAt());
            }
        };
    }

    static String lookup(String rawInput) {
        // Accept either "CARRIER:NUMBER" or just "NUMBER".
        String number = rawInput.contains(":")
                ? rawInput.substring(rawInput.indexOf(':') + 1).trim()
                : rawInput.trim();
        String status = TRACKING_DB.get(number);
        return status != null ? status
                : "No tracking information found for '" + number + "'.";
    }
}

A few things worth pointing out, because this pattern repeats for every skill you’ll ever write:

The skill is just a SkillHandler bean. Spring AI’s auto-config picks it up and routes incoming message/send calls to it based on skillId(). No @RequestMapping, no JSON parsing, none of that.
The agent card is a bean too. That bean is what gets served at /.well-known/agent-card.json. The description, tags and examples fields aren’t decoration — they’re what an LLM-based caller will read to decide whether to send work your way. Treat them as part of the public API.
Output is a Task.Artifact. An artifact is just a typed bucket of Parts. We’re using TextPart here because the answer is a string. If we had structured data — JSON, a file, an image — we’d use DataPart or FilePart instead. Same envelope.
The lookup is deterministic. No LLM call here. That’s deliberate. The point of the demo is to show the protocol, not to throw a model at every problem. In the real world you’d swap TRACKING_DB for a RestClient call to UPS or DHL.

That’s the whole downstream agent. Run it with --spring.profiles.active=order-shipping and you have a working A2A endpoint.

5. The upstream agent — `order-status` (server and client)

This is the interesting one. Same SkillHandler shape as before, but now the handler does an A2A call to another agent inside its own logic.

First the wiring — agent card + the outbound A2aClient + the skill:

@Configuration(proxyBeanMethods = false)
@Profile("order-status")
@ConditionalOnProperty(prefix = "spring.ai.a2a.server", name = "enabled", havingValue = "true")
@EnableConfigurationProperties(OrderStatusConfig.OrderProps.class)
public class OrderStatusConfig {

    public static final String SKILL_ID = "order-status";

    static final Map<String, Order> ORDERS = Map.of(
        "ORD-1001", new Order("ORD-1001", "Mechanical keyboard", "UPS",  "1Z999AA10123456784"),
        "ORD-1002", new Order("ORD-1002", "Coffee beans (1kg)", "FedEx", "FX555000123"),
        "ORD-1003", new Order("ORD-1003", "Running shoes",      "DHL",   "DHL7788991122"),
        "ORD-1004", new Order("ORD-1004", "USB-C cable",        "USPS",  "USPS940010000000")
    );

    @Bean
    public AgentCard orderStatusAgentCard() {
        AgentSkill skill = new AgentSkill(
                SKILL_ID, "Order Status",
                "Tells a customer where their order is — combines order data with live shipment tracking.",
                List.of("ecommerce", "customer-service", "a2a-to-a2a"),
                List.of("ORD-1001", "Where is ORD-1003?")
        );
        return AgentCard.text("order-status-agent",
                "Answers 'where is my order?' by delegating shipment tracking to a downstream A2A agent.",
                "http://localhost:9601", "1.0.0",
                AgentCapabilities.basic(), List.of(skill));
    }

    /** The outbound A2A client that points to the shipping-tracker agent. */
    @Bean
    public A2aClient shippingTrackerClient(OrderProps props) {
        return A2aClientBuilder.forUrl(props.shippingUrl()).build();
    }

    @Bean
    public SkillHandler orderStatusSkill(A2aClient shippingTrackerClient) {
        return new SkillHandler() {
            @Override public String skillId() { return SKILL_ID; }
            @Override public Task handle(Message request, Task seed) {
                return doLookup(request, seed, shippingTrackerClient);
            }
        };
    }

    @ConfigurationProperties(prefix = "order-status")
    public record OrderProps(@DefaultValue("http://localhost:9600") String shippingUrl) {}
    record Order(String id, String product, String carrier, String trackingNumber) {}
}

The thing to notice is that A2aClient is just a regular Spring bean. We injected it into the skill the same way we’d inject a JdbcTemplate. The client knows exactly one URL — that of the downstream agent — and exposes the four A2A methods. Nothing about it is LLM-aware.

Now the actual work. This is the bit that does the A2A hop:

static Task doLookup(Message request, Task seed, A2aClient shippingTracker) {
    String userText = Part.texts(request.parts().stream())
            .reduce((a, b) -> a + " " + b).orElse("").trim();
    String orderId = extractOrderId(userText);          // regex: ORD-\d+

    Order order = orderId == null ? null : ORDERS.get(orderId);
    if (order == null) {
        return failed(seed, "Sorry, I couldn't find an order matching '" + userText + "'.");
    }

    String shipmentStatus;
    String downstreamTaskId = null;
    try {
        // ─── A2A-to-A2A hop: order-status → shipping-tracker ───────────────
        String trackingQuery = order.carrier() + ":" + order.trackingNumber();
        Task tracked = shippingTracker.sendMessage(Message.userText(trackingQuery));
        downstreamTaskId = tracked.id();
        shipmentStatus = tracked.artifacts().stream()
                .flatMap(a -> Part.texts(a.parts().stream()))
                .findFirst()
                .orElse("(no status returned by carrier)");
    } catch (Exception ex) {
        // Downstream is down? Degrade gracefully — never propagate the failure raw.
        shipmentStatus = "(live tracking is currently unavailable — please retry shortly)";
    }

    String customerReply = String.format(
            "Order %s — %s (carrier: %s, tracking #%s).%n  → %s",
            order.id(), order.product(), order.carrier(), order.trackingNumber(), shipmentStatus);

    Task.Artifact artifact = new Task.Artifact(
            "order-status-" + seed.id(), "order-status-reply",
            List.of(
                new Part.TextPart(customerReply),
                new Part.DataPart(Map.of(
                    "orderId",          order.id(),
                    "carrier",          order.carrier(),
                    "trackingNumber",   order.trackingNumber(),
                    "downstreamAgent",  "shipping-tracker-agent",
                    "downstreamTaskId", downstreamTaskId == null ? "" : downstreamTaskId
                ))
            )
    );
    return new Task(seed.id(), seed.contextId(),
            Task.TaskStatus.of(Task.State.completed),
            seed.history(), List.of(artifact), seed.createdAt());
}

There’s a lot in there but most of it is shaping data. The actually-A2A line is this one:

Task tracked = shippingTracker.sendMessage(Message.userText(trackingQuery));

That’s it. One method call, fully synchronous, returns a Task. Same shape as what we return. This is what makes the protocol composable — the thing you call back is the same thing your callers expect from you. You can stack as many hops as you like and the data model never changes.

Two more things I want to draw attention to in that handler, because they’re patterns I’d reach for again:

The try/catch around the downstream call is non-negotiable. A2A doesn’t change the rules of distributed systems — the other agent can be down, slow, or just plain wrong. Catch the exception, return a degraded-but-still-completed Task, log the failure with full context. Never let a downstream error bubble up to your caller as a 500. Your skill’s contract is “I always return a Task”; live up to it.
Carry a DataPart next to the TextPart. The text part is for the human. The data part is for anything programmatic downstream — observability, audits, chaining into another agent. In the demo I’m stuffing the downstreamTaskId into it, which means a tracing tool can follow the call from the client through both hops just by walking task IDs. That’s free observability and you should always pay the price of the extra few bytes.

6. The flow, step by step

Here’s what actually happens on the wire when the client asks “Where is ORD-1003?”.

Figure: One question in, two A2A hops, one composed reply out — the caller only sees its single round trip.

Walking through it once:

The client sends message/send to :9601 with the user’s question as a TextPart.
order-status regex-extracts ORD-1003 from the free-form text.
It looks the order up in its in-memory order book (in production: a JPA repo, an OMS REST call, whatever).
It builds a tracking query like "DHL:DHL7788991122".
It calls shippingTracker.sendMessage(...) — a second A2A hop, this time as a client.
shipping-tracker runs its own track-shipment skill, returns a completed Task with one artifact.
order-status extracts the text from the downstream artifact, composes a customer-friendly reply, packs both TextPart and DataPart into one artifact.
Returns a single completed Task to the original caller. The client prints the text and exits.

The original caller made one round trip. Two agents collaborated. Nobody saw it but you and the logs.

7. The client (because it’s tiny and beautiful)

For completeness, here’s the entire client. It’s a CommandLineRunner that does discovery, sends one message, prints whatever artifacts come back, exits.

@Component
@Profile("order-client")
public class OrderClientRunner implements CommandLineRunner {

    private final A2aClient client;

    public OrderClientRunner(A2aClient client) { this.client = client; }

    @Override
    public void run(String... args) {
        String question = args.length > 0 ? String.join(" ", args)
                                          : "Where is my order ORD-1001?";

        AgentCard card = client.fetchAgentCard();           // discovery
        log.info("Discovered '{}' v{} at {} — skills: {}",
                card.name(), card.version(), card.url(),
                card.skills().stream().map(AgentSkill::id).toList());

        log.info("Asking: \"{}\"", question);
        Task task = client.sendMessage(Message.userText(question));  // the call
        log.info("Task {} state={}", task.id(), task.status().state());

        task.artifacts().forEach(a ->
            Part.texts(a.parts().stream()).forEach(line -> log.info("  → {}", line)));
    }
}

Notice we always call fetchAgentCard() before sendMessage. That’s not just polite — it’s the whole reason discovery exists. In a real client (especially an LLM-driven one) you’d cache the card and use the description, examples and skills to decide whether this is the right agent for the job at hand. Skipping discovery means hard-coding the contract, which kind of defeats the point.

The output looks like this:

Order ORD-1003 — Running shoes (carrier: DHL, tracking #DHL7788991122).
  → Delivered — signed for by 'M. SILVA' at 09:42.

One line of code on the client side. Two agents collaborated to produce it.

8. Things I learned the hard way (so you don’t have to)

Some of these are genuinely A2A-specific, others are just classic distributed-systems lessons that bite extra hard when there’s an LLM somewhere in the loop.

Keep routing in code, not in the LLM. In the demo, the order-status skill programmatically decides to call shipping-tracker. There’s no LLM going “hmm, maybe I should call the other agent”. I cannot stress how much more reliable this is. LLM-driven routing is fine for exploration, terrible for production. If your system has a known-good decision tree, write it in Java. Save the LLM for the bits that genuinely need fuzzy reasoning.

Make every skill idempotent. A2A clients will retry. Your callers will retry. The network will burp. If track-shipment is idempotent — same input → same output, no side effects — none of that matters. The moment you start firing “send the customer an email” or “charge the card” inside a skill, you need an idempotency key, a deduplication store, and probably a human-in-the-loop gate. Easier to design the skill so it doesn’t need any of that.

Budget your hops. This demo has exactly one downstream call, but A2A is composable — agent A can call B which can call C which can call A again, and now you have a loop. Set a max hop depth (track it through contextId or a header), put a wall-clock timeout on every A2aClient call, and fail fast. It’s the same max_iterations pattern you’d use in a single ReAct agent, just at the network boundary.

Trace through the taskId chain. The DataPart trick I showed in section 5 — stashing downstreamTaskId in your reply — is what makes a multi-agent trace navigable after the fact. Your APM probably won’t follow A2A calls automatically (it’s just HTTP to it), so leave breadcrumbs. Even better: propagate a traceparent header through A2aClient so OpenTelemetry can stitch the spans together.

Don’t share state across agents. Each agent in the demo has its own little world: order-status owns the order book, shipping-tracker owns the carrier DB. They never reach into each other’s data. If they ever need to share a fact, it goes over the wire, in a Part, as part of an explicit message. That’s what keeps the agents independently deployable, independently testable, and independently scalable. The moment you’re tempted to share a database between two agents, you’ve made one agent that happens to be split across two JVMs — which is the worst of both worlds.

Per-agent budgets. Even though we’re not doing LLM calls in the demo skills, the BEDROCK_MODEL and Bedrock client are wired in. The minute you add LLM-backed reasoning to one of these skills — say, “summarise the carrier’s vague status into something a human can understand” — give that call its own token budget and timeout. Don’t let one agent’s runaway model invocation eat the whole request budget.

Encapsulate, encapsulate, encapsulate. This one is almost philosophical. The reason the order-client doesn’t know about shipping-tracker is the same reason your OrderService doesn’t know about your PaymentGateway’s connection pool: it’s none of its business. A2A makes this trivial — every agent is a black box behind a single skill name. Lean into that. Don’t let your client code start “knowing” about downstream agents. The moment it does, you’ve leaked an internal detail into your public API and you’ve lost the ability to swap, scale or replace the downstream agent without coordinated changes.

9. Wrap up

What’s nice about this whole exercise is how ordinary it ends up looking. There’s no agent framework. No orchestration DSL. No giant prompt that pretends to coordinate everything. There’s:

Two @Configuration classes.
Two SkillHandler beans.
One A2aClient bean, injected into a skill.
A handful of A2A protocol records (Task, Message, Part, …).
Spring AI’s auto-configuration filling in the HTTP plumbing.

The result is a system where you can swap either agent for a totally different implementation — Python, Go, a hosted service, doesn’t matter — as long as it speaks A2A. That portability is the real prize. Multi-agent systems are interesting; multi-agent systems whose components you can casually replace at the protocol boundary are a different conversation entirely.

The full source — including streaming endpoints, integration tests, agent-card discovery, and a working AWS Bedrock wiring you can plug into the skills — is in the order-status repo. Clone it, run the three profiles in three terminals, and ask it where your order is. It’ll tell you.

Next up in this little A2A series I want to actually put an LLM inside one of these skills — let the order-status agent reason about ambiguous user input rather than regex its way through. That’s where it gets fun.