If you're building with Claude, you'll hit this wall.
You pick Opus. The reasoning is brilliant. The invoice arrives. You switch to Sonnet. The price drops. So does the quality on anything hard. You pick Opus again for the difficult calls and Sonnet for everything else, and now you're managing two models, two contexts, and two sets of prompts.
Anthropic just shipped a third option that makes that whole dance unnecessary.
The Claude advisor API, in beta since April 9, 2026, lets you pair a fast executor model (Sonnet or Haiku) with Opus as an on-demand advisor. When the executor hits a decision it's not confident about, it calls the advisor mid-task. Opus weighs in. The executor continues. The whole thing happens inside a single API call. No second request. No context synchronization. No orchestration layer.
In Anthropic's benchmarks, Sonnet with an Opus advisor scored 74.8% on SWE-bench Multilingual versus 72.1% for Sonnet alone, and cost 11.9% less than running Opus solo. And the Haiku numbers are even more striking, but we'll get to those.
This post covers what the Claude advisor tool is, how it works server-side, and exactly how to add it to an existing agent.
Builder 2.0 is the harness built for exactly this kind of agent work — run 20+ Claude-powered agents in parallel, each in its own cloud container with browser preview, with Slack and Jira wired in so your whole team ships via auto-generated PRs. No orchestration glue, just working features in production.
TL;DR: The Claude advisor API is a beta feature that lets you designate Claude Opus as an on-demand advisor for a faster executor model (Sonnet or Haiku). The executor calls the advisor mid-task when it needs strategic input. Everything happens in one API call — no extra network round-trips, no orchestration code, no context syncing.
The advisor pattern itself isn't new. In 2023, which is like saying the AI Bronze Age Researchers at UC Berkeley published a paper titled "How to Train Your Advisor: Steering Black-Box LLMs with ADVISOR MODELS." They found that small models trained to generate per-instance natural language advice could noticeably improve larger models' output. Anthropic built that same pattern directly into the Claude API.
The Anthropic advisor API adds a new tool type to your existing tools array. You enable it with a single beta header. Your executor model (the one doing the actual work) knows when to call the advisor. When it does, the call happens server-side. No round-trip. No client-side logic.
This is available today on the Claude API. It doesn't need a waitlist or special application for API access. Enterprise customers with Zero Data Retention (ZDR) agreements can use it without changing their data handling setup. The advisor is explicitly ZDR-eligible.
TL;DR: The executor model generates normally until it decides it needs help. It emits a signal the server intercepts, which pauses the executor and runs Opus on the full conversation history. Opus sends back ~400-700 tokens of advice — never shown to the user — and the executor resumes informed. One API call, transparent to the client.
Here's the server-side flow step by step:
- You send a
POST /v1/messagesrequest with the advisor tool defined in thetoolsarray - The executor model (Sonnet or Haiku) runs and generates output as normal
- When it hits a decision it wants help with, it emits a structured token block (
{"type": "server_tool_use", "name": "advisor"}). That's the trigger. - The server pauses the executor and runs Opus with the entire conversation history: the original prompt, every tool call made so far, and every result the executor has seen
- Opus generates an advisory message — a plan, a correction, a strategic next step — in approximately 400-700 tokens
- That advice is injected back into the assistant message stream as an
advisor_tool_resultblock. The user never sees this. - The executor resumes, now informed by Opus's guidance, and continues generating
Nothing changes on the client side. One request in, one response out.
Two things to note. The advisor reads full conversation context but can only return text advice. Its tokens bill at the Opus rate but don't count against the executor's max_tokens cap. Both appear in the usage object, so cost attribution is clean.
TL;DR: Add the
anthropic-beta: advisor-tool-2026-03-01header to your request and include{"type": "advisor_20260301", "name": "advisor", "model": "claude-opus-4-6"}in your tools array. Setmax_usesto cap advisor calls — the primary cost control lever. That's the full integration: same endpoint, same SDK version, no orchestration changes required. Your existing agent code stays unchanged.
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 4096,
tools: [
{
type: "advisor_20260301",
name: "advisor",
model: "claude-opus-4-6",
max_uses: 3,
},
],
messages: [
{
role: "user",
content:
"Refactor this Go service to use a worker pool with graceful shutdown.",
},
],
});
console.log(response.content);max_uses caps how many times the advisor can be called per request. When that limit is hit, further advisor requests return a max_uses_exceeded block and the executor continues without more advice. This is your primary cost control lever. Set it based on task complexity.
caching enables advisor-side prompt caching. Add "caching": {"type": "ephemeral", "ttl": "5m"} if you're expecting three or more advisor calls in a single session. It lets Opus skip re-processing unchanged context on repeat calls, which saves tokens.
System prompt guidance. Anthropic's recommended approach is to tell the executor when to call the advisor. Their suggested template:
"You have access to an
advisortool backed by a stronger model. Call the advisor before substantive work — before writing, before committing to an interpretation, before building on an assumption. Also call advisor when you believe the task is complete, before delivering output. On tasks longer than a few steps, call advisor at least once before finalizing."
In practice, most of the value comes from one or two advisor calls per task: once early for orientation, once before finalizing output.
Production note: Priority Tier on the executor model doesn't extend to the advisor. If you're running production workloads, track advisor token usage separately in the usage object. It's broken out by model tier, so cost attribution is clean.
TL;DR: For quality-sensitive tasks — coding agents, architecture decisions, complex research — use Sonnet as executor with Opus as advisor. You get near-Opus accuracy for less than Opus alone. For high-volume, cost-sensitive workloads, the Haiku+Opus pair is worth serious consideration: 85% cheaper than Sonnet, and dramatically better quality than Haiku alone.
The announcement focuses on Sonnet+Opus, and for good reason. Sonnet with an Opus advisor scored 74.8% on SWE-bench Multilingual, up from 72.1% for Sonnet alone. That's a 2.7 percentage point gain on a hard coding benchmark. And it cost 11.9% less than running Opus solo for the same tasks.
But the Haiku numbers are more dramatic.
Haiku alone scored 19.7% on BrowseComp, a research-heavy browsing benchmark. Haiku with an Opus advisor scored 41.2%. That's more than double. And this Haiku+Opus pair costs 85% less than running Sonnet for the same task.
That 85% number changes budget conversations. If you're running Claude at scale for classification, extraction, or pattern-matching that occasionally needs complex reasoning, the Haiku+Opus pair is worth testing.
Here's a practical decision matrix:
| Use case | Best pair | Why |
|---|---|---|
Complex coding, architecture decisions | Sonnet + Opus | Near-Opus accuracy on hard tasks; still cheaper than Opus |
High-volume extraction and classification | Haiku + Opus | 85% cost savings vs Sonnet; quality far exceeds Haiku alone |
Research agents, multi-step analysis | Sonnet + Opus | Confirmed gains on reasoning-heavy benchmarks |
Cost-constrained consumer apps | Haiku + Opus | Meaningful quality improvement at a fraction of Sonnet cost |
One honest caveat. All of these benchmarks are Anthropic's own. No independent third-party results exist yet. This is a three-day-old beta. And Haiku+Opus scores approximately 29% below Sonnet on general tasks. If your bar is raw Sonnet-level quality, use Sonnet+Opus. If you're currently running Haiku and want a cost-effective upgrade, Haiku+Opus is the move.
For broader context on how Opus and Sonnet compare across real agent sessions, our practical guide to Claude Code covers the model selection tradeoffs in daily development.
TL;DR: Skip the advisor tool for single-turn queries, trivial tasks, and latency-critical paths. It adds the most value in multi-step agentic workflows with real decision points. On simple tasks, the executor won't invoke the advisor anyway — but adding it adds overhead and complexity for no gain.
A few specific patterns where the advisor tool adds no value:
Single-turn queries. If the user asks "Summarize this document" and there's only one step to take, the executor won't invoke the advisor. The tool sits idle. You've added a beta header and a tool definition for nothing.
Trivial mechanical tasks. Data formatting, lookups, regex transformations. These don't have decision points that trigger the advisor. Same result, more complexity.
Already-optimized Opus-only workflows. If you're already running Opus and quality is your only concern, the advisor adds nothing. You're effectively advising Opus with Opus.
Latency-critical paths. There's no extra network round-trip, but Opus generation still takes time. On paths where every 100ms counts, the advisor's internal invocation adds latency you haven't accounted for.
When you need deterministic behavior. The advisor introduces non-determinism. Opus may give different guidance on reruns. If your pipeline requires reproducible outputs, test carefully before relying on advisor calls.
Anthropic's Building Effective Agents guide makes the same point broadly: add complexity only when it demonstrably improves outcomes.
Three pairs are currently supported: Claude Haiku 4.5 as executor with Claude Opus 4.6 as advisor; Claude Sonnet 4.6 as executor with Claude Opus 4.6 as advisor; and Claude Opus 4.6 running as both executor and advisor. Any other combination returns an HTTP 400 error. The advisor must always be at least as capable as the executor.
Yes. Claude Haiku 4.5 can be the executor with Claude Opus 4.6 as the advisor. In Anthropic's BrowseComp benchmarks, this pair improved Haiku's performance from 19.7% to 41.2% (more than double) while costing 85% less than Sonnet. For high-volume tasks that need occasional complex reasoning, this pair delivers better quality at a fraction of Sonnet's cost.
You're billed at each model's standard per-token rate. The executor (Sonnet or Haiku) generates at its lower rate. Opus generates the advisory response (~400-700 tokens) at the Opus rate. Total cost typically runs lower than running Opus alone for the same task. Advisor tokens are broken out separately in the usage object for clean cost attribution.
Yes. As of April 2026, the advisor tool requires the anthropic-beta: advisor-tool-2026-03-01 header. It's accessible through the standard Claude API with no special waitlist or application required. Enterprise customers with Zero Data Retention (ZDR) agreements can use it without changing their data handling setup. Contact your Anthropic account team for enterprise-specific arrangements.
The Opus-or-Sonnet decision used to be a binary tradeoff. You picked quality or you picked cost.
The Claude advisor API gives you a dial. Use Sonnet as your workhorse, bring in Opus on the hard calls, and pay less than you would running Opus full-time. Or go further with Haiku and let Opus double your quality at 85% of Sonnet's cost.
One header and one tool definition to wire it into an existing agent. Anthropic's advisor tool documentation covers the full specification, including caching options and Anthropic's complete system prompt template.
If you're building visual workflows on top of Claude agents, Builder.io integrates with Claude for AI-powered content and development workflows.