Opus 4.7 Made Agentic Work Accountable. Here's How.

Claude Opus 4.7 shipped April 16. Here's the 48-hour read on what changed for operators running agentic client work — and what didn't.

Opus 4.7 is Anthropic's latest generally available model, and as of this week it's the default in Claude Code for all plan tiers. This matters specifically if you're running Claude Code for agentic client work: document processing, code generation against real codebases, or anything where the model makes multiple decisions across a session before waiting for your input.

Three things changed that an operator should care about. The xhigh effort level is now the default in Claude Code — previously the default was high. Task budgets entered public beta on the API, giving you a token ceiling on agentic sessions. And vision resolution tripled, from 1.15 megapixels to 3.75 [1]. Everything else is refinement: a new tokenizer, benchmark gains on SWE-bench Verified (80.8% → 87.6%) [2], the Claude Design tool in Labs. Worth knowing. Not worth reorganizing your week.

One thing that didn't change: the price. Same as Opus 4.6. The tokenizer is new, though. Text that ran at 30k tokens on 4.6 may run at 32–40k on 4.7, depending on content type — a 1x–1.35x multiplier on token spend [1]. Not free. Factor it in before assuming the upgrade is cost-neutral at volume.

The kind of task xhigh actually moves

The test worth running, if you want to feel the upgrade: SOP document extraction. PDFs — part written instruction, part table, occasionally a screenshot of a spreadsheet someone decided was close enough — parsed into a JSON object that feeds an n8n or Make workflow: trigger conditions, responsible party, escalation threshold. The kind of parsing job every SMB client has hiding in a folder somewhere.

At high effort on Opus 4.6, this shape of workflow handles clean documents fine. Documents with ambiguous phrasing — "notify the team" without naming one, deadline language that was relative ("within 24 hours") without anchoring to an event — degrade. First-pass accuracy drops, and the failures are usually silent: the model guesses rather than flagging the ambiguity.

At xhigh on Opus 4.7, the failure mode shifts. Instead of silently writing "responsible_party": "operations" when the doc said "notify the team," the model is more likely to surface an explicit "ambiguous": true annotation in the JSON and note the missing reference. The portion that still needs human eyes becomes identifiable and tagged for review rather than buried in a clean-looking output.

That's the difference between a review queue and a mistake factory.

Token cost goes up at xhigh versus high — enough to model before switching on high-volume workloads, not enough to disqualify it on ambiguity-heavy ones. Run your own numbers on a representative batch before making xhigh the default across every session.

xhigh effort: what it actually changes

Anthropic's coding evaluation results for Opus 4.7 — SWE-bench Verified gains driven by xhigh effort Source: Anthropic — Introducing Claude Opus 4.7

Effort in Claude Code controls how deeply the model reasons before acting. High is solid — it moves well on well-defined tasks. xhigh raises the floor. Claude revisits its own assumptions more often, takes extra passes before tool calls, and handles edge cases more reliably on tasks with fuzzy inputs [3].

Not for: clean, well-structured tasks with deterministic outputs. If your agentic loop produces good results at high effort, the extra token spend from xhigh won't return the difference. Save xhigh for messy work — unstructured documents, ambiguous instructions, anything where a human would naturally pause and re-read before acting.

The effort level is configurable. Drop it back to high for specific sessions via /config in Claude Code if you know the task is clean.

Task budgets: the feature agentic workflows actually needed

Long-running Claude Code sessions have had a cost transparency problem. Send Claude into a codebase or a folder of documents without constraints, and it runs until it's done — which can mean 40 minutes and 200k tokens if the scope is loose. You find out after.

Task budgets put a ceiling on a session. Set a token cap with /config task_budget 50000 in Claude Code. When Claude is about to exceed it, it pauses and checks in [3]. Not a hard kill — a handoff. It surfaces what it's completed and what remains. You decide whether to extend.

This is in public beta on Opus 4.7, accessed via the task-budgets-2026-03-13 beta header on the API.

Run your next agentic batch under a token budget. When the session approaches the cap, the check-in surfaces what's been completed, estimates what's left, and hands control back. That visibility is the gain. The alternative is watching a progress bar and hoping.

Mechanically, the check-in looks like this:

sequenceDiagram
  participant User
  participant Claude Code
  participant Tools
  User->>Claude Code: Start task (budget: 40k tokens)
  loop While tokens_used < budget
    Claude Code->>Tools: tool_use
    Tools-->>Claude Code: tool_result
    Claude Code->>Claude Code: reason + continue
  end
  Note over Claude Code: Approaching budget
  Claude Code->>User: Pause: "28/35 done, est 8k to finish"
  User->>Claude Code: extend (or stop)
  Claude Code->>User: Final output

Agentic work should be accountable. Task budgets make it accountable.

Vision at 3×: matters if images are in your stack

From 1.15MP to 3.75MP is a real improvement if vision is part of your workflow [1]. Smaller text in screenshots resolves cleanly. Dense diagrams stop requiring workaround prompts. The SWE-bench gains — partially driven by the model's improved ability to parse visual context in coding environments [2] — show up in workflows that depend on reading UI elements or interpreting technical diagrams.

The shape of workflow this matters for: photographed whiteboards from warehouse operations, handwritten inventory snapshots, dense schematics sent as JPEGs by clients who don't think in structured data. The 4.6 failure mode on that kind of input was plausible-looking fabrication — hallucinated line items on a dense board. A triple in resolution doesn't make the model honest, but it gives the model more signal to be honest with.

If your inputs are structured text, the vision improvement is real but not your headline reason to upgrade. The reasoning and accuracy gains matter more for that use case.

What I'd change and what I wouldn't

Worth changing: Opus 4.7 becomes the default in Claude Code for agentic work. xhigh effort stays on for document-heavy, ambiguity-prone workflows. Task budgets go on every agentic session where the scope isn't tightly bounded going in.

Don't change: for clean, structured extraction tasks — well-formatted data in, well-formatted data out — Sonnet 4.6 is still better value. The Opus 4.7 gains are most pronounced on messy inputs. The cost delta is real and worth modeling against your own workload. The choice between Opus and Sonnet is the same logic as the choice between Claude Code and a deterministic workflow tool — use the heavier option when the task actually needs the judgment. (That framework lives here if you haven't read it.)

Don't change yet: blanket token-budget estimates. The tokenizer is new; the actual delta on your workloads is worth measuring before you scale up. Run your standard task, check the token count, compare.

Opus 4.7 is a meaningful upgrade for operators running agentic work with messy inputs. Not a reinvention — Anthropic called it an incremental improvement, and that's accurate. But xhigh effort defaults and task budgets together change the day-to-day feel of production sessions in ways that compounding cost control and compounding accuracy matter. Less silent drift. More surfaced uncertainty. If that's the shape of the work you run for clients, it's worth the upgrade check.

The short version

xhigh effort is now the default in Claude Code — leave it on for messy, ambiguity-prone work; drop back to high for clean deterministic tasks
Task budgets (public beta, API header task-budgets-2026-03-13) — put a ceiling on agentic sessions, get a pause-and-check-in instead of surprise token bills
Vision tripled to 3.75 MP — matters a lot if your workflow reads screenshots, whiteboards, or dense diagrams; barely matters if your inputs are structured text
Token cost per run goes up at xhigh — not enough to disqualify; enough to model before flipping defaults on high-volume workloads
Sonnet 4.6 stays cheaper for clean extraction — Opus 4.7 earns its spend on judgment-heavy, messy input

Sources

[1] Anthropic — "Introducing Claude Opus 4.7" — https://www.anthropic.com/news/claude-opus-4-7

[2] AWS News Blog — "AWS Weekly Roundup: Claude Opus 4.7 in Amazon Bedrock, AWS Interconnect GA, and more (April 20, 2026)" — https://aws.amazon.com/blogs/aws/aws-weekly-roundup-claude-opus-4-7-in-amazon-bedrock-aws-interconnect-ga-and-more-april-20-2026/

[3] Anthropic — Claude Code Release Notes (v2.1.117, v2.1.116, April 2026) — https://github.com/anthropics/claude-code/releases