Claude Advisor API: Use Opus for 80% Less Money

Q: Is the Claude advisor API in beta?

Yes. As of April 2026, the advisor tool requires the anthropic-beta: advisor-tool-2026-03-01 header. It's accessible through the standard Claude API with no special waitlist or application required. Enterprise customers with Zero Data Retention (ZDR) agreements can use it without changing their data handling setup. Contact your Anthropic account team for enterprise-specific arrangements.

If you're building with Claude, you'll hit this wall.

You pick Opus. The reasoning is brilliant. The invoice arrives. You switch to Sonnet. The price drops. So does the quality on anything hard. You pick Opus again for the difficult calls and Sonnet for everything else, and now you're managing two models, two contexts, and two sets of prompts.

Anthropic just shipped a third option that makes that whole dance unnecessary.

The Claude advisor API, in beta since April 9, 2026, lets you pair a fast executor model (Sonnet or Haiku) with Opus as an on-demand advisor. When the executor hits a decision it's not confident about, it calls the advisor mid-task. Opus weighs in. The executor continues. The whole thing happens inside a single API call. No second request. No context synchronization. No orchestration layer.

In Anthropic's benchmarks, Sonnet with an Opus advisor scored 74.8% on SWE-bench Multilingual versus 72.1% for Sonnet alone, and cost 11.9% less than running Opus solo. And the Haiku numbers are even more striking, but we'll get to those.

This post covers what the Claude advisor tool is, how it works server-side, and exactly how to add it to an existing agent.

Builder 2.0 is the harness built for exactly this kind of agent work — run 20+ Claude-powered agents in parallel, each in its own cloud container with browser preview, with Slack and Jira wired in so your whole team ships via auto-generated PRs. No orchestration glue, just working features in production.

What is the Claude advisor API?

TL;DR: The Claude advisor API is a beta feature that lets you designate Claude Opus as an on-demand advisor for a faster executor model (Sonnet or Haiku). The executor calls the advisor mid-task when it needs strategic input. Everything happens in one API call — no extra network round-trips, no orchestration code, no context syncing.

The advisor pattern itself isn't new. In 2023, which is like saying the AI Bronze Age Researchers at UC Berkeley published a paper titled "How to Train Your Advisor: Steering Black-Box LLMs with ADVISOR MODELS." They found that small models trained to generate per-instance natural language advice could noticeably improve larger models' output. Anthropic built that same pattern directly into the Claude API.

The Anthropic advisor API adds a new tool type to your existing tools array. You enable it with a single beta header. Your executor model (the one doing the actual work) knows when to call the advisor. When it does, the call happens server-side. No round-trip. No client-side logic.

This is available today on the Claude API. It doesn't need a waitlist or special application for API access. Enterprise customers with Zero Data Retention (ZDR) agreements can use it without changing their data handling setup. The advisor is explicitly ZDR-eligible.

How does the Claude advisor tool work?

TL;DR: The executor model generates normally until it decides it needs help. It emits a signal the server intercepts, which pauses the executor and runs Opus on the full conversation history. Opus sends back ~400-700 tokens of advice — never shown to the user — and the executor resumes informed. One API call, transparent to the client.

Here's the server-side flow step by step:

You send a POST /v1/messages request with the advisor tool defined in the tools array
The executor model (Sonnet or Haiku) runs and generates output as normal
When it hits a decision it wants help with, it emits a structured token block ({"type": "server_tool_use", "name": "advisor"}). That's the trigger.
The server pauses the executor and runs Opus with the entire conversation history: the original prompt, every tool call made so far, and every result the executor has seen
Opus generates an advisory message — a plan, a correction, a strategic next step — in approximately 400-700 tokens
That advice is injected back into the assistant message stream as an advisor_tool_result block. The user never sees this.
The executor resumes, now informed by Opus's guidance, and continues generating

Nothing changes on the client side. One request in, one response out.

Two things to note. The advisor reads full conversation context but can only return text advice. Its tokens bill at the Opus rate but don't count against the executor's max_tokens cap. Both appear in the usage object, so cost attribution is clean.

How do you add the Claude advisor tool to an existing agent?

TL;DR: Add the anthropic-beta: advisor-tool-2026-03-01 header to your request and include {"type": "advisor_20260301", "name": "advisor", "model": "claude-opus-4-6"} in your tools array. Set max_uses to cap advisor calls — the primary cost control lever. That's the full integration: same endpoint, same SDK version, no orchestration changes required. Your existing agent code stays unchanged.

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 4096,
  tools: [
    {
      type: "advisor_20260301",
      name: "advisor",
      model: "claude-opus-4-6",
      max_uses: 3,
    },
  ],
  messages: [
    {
      role: "user",
      content:
        "Refactor this Go service to use a worker pool with graceful shutdown.",
    },
  ],
});

console.log(response.content);

curl and Python examples in the Anthropic advisor tool docs.

Configuring the advisor

max_uses caps how many times the advisor can be called per request. When that limit is hit, further advisor requests return a max_uses_exceeded block and the executor continues without more advice. This is your primary cost control lever. Set it based on task complexity.

caching enables advisor-side prompt caching. Add "caching": {"type": "ephemeral", "ttl": "5m"} if you're expecting three or more advisor calls in a single session. It lets Opus skip re-processing unchanged context on repeat calls, which saves tokens.

System prompt guidance. Anthropic's recommended approach is to tell the executor when to call the advisor. Their suggested template:

"You have access to an advisor tool backed by a stronger model. Call the advisor before substantive work — before writing, before committing to an interpretation, before building on an assumption. Also call advisor when you believe the task is complete, before delivering output. On tasks longer than a few steps, call advisor at least once before finalizing."

In practice, most of the value comes from one or two advisor calls per task: once early for orientation, once before finalizing output.

Production note: Priority Tier on the executor model doesn't extend to the advisor. If you're running production workloads, track advisor token usage separately in the usage object. It's broken out by model tier, so cost attribution is clean.

Which model pair should you use: Haiku+Opus or Sonnet+Opus?

TL;DR: For quality-sensitive tasks — coding agents, architecture decisions, complex research — use Sonnet as executor with Opus as advisor. You get near-Opus accuracy for less than Opus alone. For high-volume, cost-sensitive workloads, the Haiku+Opus pair is worth serious consideration: 85% cheaper than Sonnet, and dramatically better quality than Haiku alone.

The announcement focuses on Sonnet+Opus, and for good reason. Sonnet with an Opus advisor scored 74.8% on SWE-bench Multilingual, up from 72.1% for Sonnet alone. That's a 2.7 percentage point gain on a hard coding benchmark. And it cost 11.9% less than running Opus solo for the same tasks.

But the Haiku numbers are more dramatic.

Haiku alone scored 19.7% on BrowseComp, a research-heavy browsing benchmark. Haiku with an Opus advisor scored 41.2%. That's more than double. And this Haiku+Opus pair costs 85% less than running Sonnet for the same task.

That 85% number changes budget conversations. If you're running Claude at scale for classification, extraction, or pattern-matching that occasionally needs complex reasoning, the Haiku+Opus pair is worth testing.

Here's a practical decision matrix:

Use case	Best pair	Why
Complex coding, architecture decisions	Sonnet + Opus	Near-Opus accuracy on hard tasks; still cheaper than Opus
High-volume extraction and classification	Haiku + Opus	85% cost savings vs Sonnet; quality far exceeds Haiku alone
Research agents, multi-step analysis	Sonnet + Opus	Confirmed gains on reasoning-heavy benchmarks
Cost-constrained consumer apps	Haiku + Opus	Meaningful quality improvement at a fraction of Sonnet cost

One honest caveat. All of these benchmarks are Anthropic's own. No independent third-party results exist yet. This is a three-day-old beta. And Haiku+Opus scores approximately 29% below Sonnet on general tasks. If your bar is raw Sonnet-level quality, use Sonnet+Opus. If you're currently running Haiku and want a cost-effective upgrade, Haiku+Opus is the move.

For broader context on how Opus and Sonnet compare across real agent sessions, our practical guide to Claude Code covers the model selection tradeoffs in daily development.

When should you NOT use the Claude advisor tool?

TL;DR: Skip the advisor tool for single-turn queries, trivial tasks, and latency-critical paths. It adds the most value in multi-step agentic workflows with real decision points. On simple tasks, the executor won't invoke the advisor anyway — but adding it adds overhead and complexity for no gain.

A few specific patterns where the advisor tool adds no value:

Single-turn queries. If the user asks "Summarize this document" and there's only one step to take, the executor won't invoke the advisor. The tool sits idle. You've added a beta header and a tool definition for nothing.

Trivial mechanical tasks. Data formatting, lookups, regex transformations. These don't have decision points that trigger the advisor. Same result, more complexity.

Already-optimized Opus-only workflows. If you're already running Opus and quality is your only concern, the advisor adds nothing. You're effectively advising Opus with Opus.

Latency-critical paths. There's no extra network round-trip, but Opus generation still takes time. On paths where every 100ms counts, the advisor's internal invocation adds latency you haven't accounted for.

When you need deterministic behavior. The advisor introduces non-determinism. Opus may give different guidance on reruns. If your pipeline requires reproducible outputs, test carefully before relying on advisor calls.

Anthropic's Building Effective Agents guide makes the same point broadly: add complexity only when it demonstrably improves outcomes.

FAQ

What model pairs work with the Claude advisor tool?

Three pairs are currently supported: Claude Haiku 4.5 as executor with Claude Opus 4.6 as advisor; Claude Sonnet 4.6 as executor with Claude Opus 4.6 as advisor; and Claude Opus 4.6 running as both executor and advisor. Any other combination returns an HTTP 400 error. The advisor must always be at least as capable as the executor.

Does the Claude advisor tool work with Claude Haiku?

Yes. Claude Haiku 4.5 can be the executor with Claude Opus 4.6 as the advisor. In Anthropic's BrowseComp benchmarks, this pair improved Haiku's performance from 19.7% to 41.2% (more than double) while costing 85% less than Sonnet. For high-volume tasks that need occasional complex reasoning, this pair delivers better quality at a fraction of Sonnet's cost.

How much does the Claude advisor tool cost?

You're billed at each model's standard per-token rate. The executor (Sonnet or Haiku) generates at its lower rate. Opus generates the advisory response (~400-700 tokens) at the Opus rate. Total cost typically runs lower than running Opus alone for the same task. Advisor tokens are broken out separately in the usage object for clean cost attribution.

Is the Claude advisor API in beta?

Yes. As of April 2026, the advisor tool requires the anthropic-beta: advisor-tool-2026-03-01 header. It's accessible through the standard Claude API with no special waitlist or application required. Enterprise customers with Zero Data Retention (ZDR) agreements can use it without changing their data handling setup. Contact your Anthropic account team for enterprise-specific arrangements.

The third option

The Opus-or-Sonnet decision used to be a binary tradeoff. You picked quality or you picked cost.

The Claude advisor API gives you a dial. Use Sonnet as your workhorse, bring in Opus on the hard calls, and pay less than you would running Opus full-time. Or go further with Haiku and let Opus double your quality at 85% of Sonnet's cost.

One header and one tool definition to wire it into an existing agent. Anthropic's advisor tool documentation covers the full specification, including caching options and Anthropic's complete system prompt template.

If you're building visual workflows on top of Claude agents, Builder.io integrates with Claude for AI-powered content and development workflows.

AI

Claude Advisor API: Use Opus for 80% Less Money

April 12, 2026

Written By Matt Abrams

What is the Claude advisor API?

How does the Claude advisor tool work?

How do you add the Claude advisor tool to an existing agent?

Configuring the advisor

Which model pair should you use: Haiku+Opus or Sonnet+Opus?

When should you NOT use the Claude advisor tool?

FAQ

What model pairs work with the Claude advisor tool?

Does the Claude advisor tool work with Claude Haiku?

How much does the Claude advisor tool cost?

Is the Claude advisor API in beta?

The third option

Share

Get the latest from Builder.io

Platform

Developer Resources

Frameworks

Resources

Popular Guides

Company