What is /ultrareview in Claude Code?

/ultrareview is a new slash command that runs a multi-agent code review — dispatching specialist agents for security, logic, performance, and style, then synthesizing findings into a single report.

Claude Opus 4.7 Review 2026: What It Can Actually Do Now

Q: When was Claude Opus 4.7 released?

Anthropic released Claude Opus 4.7 on April 16, 2026 across Claude.ai, the Anthropic API (claude-opus-4-7), Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.

Q: How much does Claude Opus 4.7 cost?

Pricing is unchanged from Opus 4.6: $5 per million input tokens and $25 per million output tokens. Prompt caching and batching discounts still apply.

Q: Is Claude Opus 4.7 better than GPT-5.4?

Opus 4.7 leads on SWE-bench Pro (64.3% vs GPT-5.4's 57.7%) and agentic coding benchmarks, but GPT-5.4 remains competitive on raw reasoning tasks. For long-horizon software engineering, Opus 4.7 is currently ahead.

Q: What is the xhigh effort level?

xhigh is a new reasoning tier between high and max, giving developers finer control over the quality/latency tradeoff. It unlocks more deliberate multi-step reasoning without paying the full max-effort latency penalty.

⚡ Key Takeaways

87.6% SWE-bench Verified — up from 80.8% on Opus 4.6. Beats GPT-5.4 and Gemini 3.1 Pro on production coding.
3x more production tasks resolved on Rakuten-SWE-Bench vs Opus 4.6, with a third of the tool errors.
2,576px vision input — 3.75 megapixels, more than triple the old limit. Real screenshots, diagrams, and UI mocks finally work.
New xhigh effort, Task budgets, /ultrareview — three new knobs specifically for long-running agentic workflows.
Same price as Opus 4.6 — $5 input / $25 output per million tokens. Upgrading is a free quality bump.

Table of Contents

What's actually new in Opus 4.7
The benchmark numbers (and which ones matter)
/ultrareview — the feature everyone is talking about
2,576px vision — why resolution matters for agents
xhigh effort & task budgets
Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro
Pricing & availability
Pros and cons
Verdict — who should upgrade
FAQ

Claude Opus 4.7 review 2026 — what it can actually do now, with 87.6% SWE-bench score — Claude Opus 4.7 — released April 16, 2026. Same price, significantly smarter.

Every time Anthropic ships a new Opus, the same question lands in our inbox: is it actually worth switching? Opus 4.6 was solid. It shipped features, it passed review, it mostly stopped arguing with us about tool calls. So when Opus 4.7 dropped on April 16, 2026, we didn't expect the jump to feel this big.

We spent 48 hours hammering it — running /ultrareview on messy legacy codebases, feeding it 2,500px architecture diagrams, chaining 40-step agentic workflows in Claude Code, and comparing side-by-side against GPT-5.4 and Gemini 3.1 Pro. This review is what we found: the benchmark wins, the real-world gotchas, what's changed in Claude Code, and whether you should upgrade your model config today.

Anthropic Claude Opus product page showing Opus 4.7 messaging — The Opus product page on anthropic.com — where the 4.7 upgrade quietly went live.

Try It Yourself

Get Claude Opus 4.7 in Claude Code

The fastest way to test Opus 4.7 is inside Claude Code — same CLI, just set model: claude-opus-4-7.

Install Claude Code →

What's actually new in Opus 4.7

Most model updates are a list of benchmark wins nobody can feel. Opus 4.7 is different — three of the changes are things you'll notice inside the first hour.

First, long-horizon agentic runs no longer collapse in the middle. Opus 4.6 had a habit of losing the thread around step 25 of a multi-step task, either repeating a tool call or hallucinating a file path. Anthropic explicitly targeted this: Opus 4.7 posts a 14% improvement on complex multi-step workflows while using fewer tokens and producing roughly a third of the tool errors.

Second, vision is finally useful for real work. The input limit jumped from roughly 768px on the long edge to 2,576px — about 3.75 megapixels. That's the difference between "I sort of see your diagram" and "I can read the labels on your ERD." Opus 4.7 hit 98.5% on visual-acuity benchmarks, up from 54.5% on 4.6.

Third, it passes "implicit-need" tests — tasks where the model must infer which tool to use without being told. This sounds small. It's not. It's the thing that makes the difference between an agent that needs hand-holding and one you can actually leave alone for twenty minutes.

Six key Claude Opus 4.7 upgrades: xhigh effort, Task budgets, /ultrareview, 2,576px vision, implicit-need tests, multi-agent coordination — The six upgrades that actually matter day-to-day.

The benchmark numbers (and which ones matter)

Benchmarks are a sport, and every vendor cherry-picks. Here's the honest read.

The headline number everyone is quoting is 87.6% on SWE-bench Verified — up from 80.8% on Opus 4.6. That jump is real and matches our own experience: tasks that used to require a retry now land first try. But SWE-bench Verified is a curated subset, and it's been saturating. The more interesting number is SWE-bench Pro at 64.3%, a harder benchmark designed to resist contamination — Opus 4.7 leads GPT-5.4 (57.7%) and Gemini 3.1 Pro there by 6-7 points.

On Rakuten-SWE-Bench, which measures real production tickets, Opus 4.7 resolved 3x more tasks than 4.6 — the single biggest quality-of-life delta we've seen between model generations. On GPQA Diamond (graduate-level science), it's at 94.2%, clearing 90% for the first time in the Claude line. CursorBench — which measures real IDE-inside completion quality — jumped from 58% to 70%.

Claude Opus 4.7 vs Opus 4.6 vs GPT-5.4 vs Gemini 3.1 Pro benchmark comparison on SWE-bench, GPQA, CursorBench — How Opus 4.7 stacks up against its predecessor and the frontier competition.

The benchmark nobody is covering enough: tool error rate. Opus 4.7 produces about one-third the tool errors of 4.6 on long agentic runs. If you've ever watched a Claude agent misquote a function signature for the fifth time in a loop, that stat will matter to you more than any SWE-bench number.

Anthropic's official Claude Opus 4.7 announcement page with benchmark charts — Anthropic's launch post — the source for most of the headline benchmark numbers.

/ultrareview — the feature everyone is talking about

/ultrareview is a new Claude Code slash command. On paper it's "a better code review." In practice it's the first code review tool we've used that catches the class of bugs that only appear when two files interact.

Run /ultrareview on a branch and Opus 4.7 dispatches four specialist sub-agents in parallel: one for security, one for logic, one for performance, and one for style. Each has its own system prompt and reads the diff independently. The main agent synthesizes their findings into a single report grouped by severity.

/ultrareview workflow — dispatch, security, logic, performance, synthesis pipeline — The /ultrareview pipeline — four specialist agents, one synthesized report.

We ran it against a 2,400-line PR that had already passed GitHub Copilot review and a human pair. It flagged two real bugs: a race condition in a cache warmup routine and a subtle auth bypass in a middleware refactor that only triggered under a specific header combination. Both were the kind of cross-file reasoning that linters and single-pass review consistently miss.

The catch: it's slow. A thorough /ultrareview on a ~2K-line diff takes 3-5 minutes and burns meaningful tokens because you're paying for four parallel agents plus the synthesizer. This is why Task budgets exist — more on those below.

Claude Code product page on anthropic.com — where /ultrareview lives — Claude Code — the CLI where /ultrareview and xhigh effort show up first.

2,576px vision — why resolution matters for agents

The vision upgrade is the sleeper feature. The old 768px long-edge limit meant every architecture diagram you pasted got downsampled to the point of near-uselessness. Text labels blurred, arrow directions inverted, and asking Opus to reason about your schema was roulette.

At 2,576px you can feed Opus 4.7 a full screenshot of a Figma board, an ERD export, a Sentry stack trace, or a dashboard screenshot and it actually reads the labels. We pasted a 2,400x1,600 architecture diagram — the kind you'd normally have to hand-annotate — and asked "which service is missing a timeout?" It found it. On 4.6, that question got a polite "I can't clearly read the labels."

This unlocks two workflows specifically: computer-use agents (where the agent screenshots its own desktop to decide what to do next) and design-to-code (where you paste a mock and ask for the React component). Both were marginal on 4.6. Both are genuinely usable on 4.7.

xhigh effort & task budgets

Opus 4.7 adds two new knobs for controlling cost and quality on long runs.

xhigh effort slots between the existing high and max reasoning levels. Our take after a day of testing: use xhigh as your default for agentic work and reserve max for the hardest planning or debugging tasks. max produces slightly better reasoning but triples latency, which matters when an agent is making 40 tool calls.

Task budgets (public beta) let you set a token ceiling on a single agentic run. The model gets told "you have X tokens total across thinking and tool calls" and paces itself accordingly. This is the thing that makes /ultrareview budget-safe: you cap it at, say, 200K tokens and it won't silently burn through a million on a stubborn diff.

Practical tip: if you're using Opus 4.7 in Claude Code, set a default Task budget in your project config. It's the single best way to avoid an overnight run that returns a 12-figure bill.

Anthropic console dashboard — where to enable xhigh effort and task budgets — console.anthropic.com — task budgets and effort levels live in the playground config.

Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro

We ran the same 20 tasks across all three models — half agentic coding, half reasoning, the rest vision and long-context. Here's the honest split.

Workload	Winner	Margin
Agentic coding (Claude Code / Cursor)	Opus 4.7	Clear (+6-8% SWE-Pro)
Raw reasoning / math proofs	GPT-5.4	Narrow
Long-context research (1M+ tokens)	Gemini 3.1 Pro	Clear on context size
Vision (diagrams, dashboards)	Opus 4.7	Large (2,576px)
Tool-use reliability (40+ steps)	Opus 4.7	Clear (1/3 error rate)
Raw speed / latency	Gemini 3.1 Pro	Fastest at equivalent quality

Short version: if your work is writing, reviewing, or running software engineering tasks with agents, Opus 4.7 is the best model in the world right now. For document-crunching research at scale, Gemini's 1M+ context is still the right tool. For raw reasoning, GPT-5.4 is still marginally ahead on the hardest puzzles.

Quick check: key upgrades in Opus 4.7

Q: What score does Opus 4.7 hit on SWE-bench Verified?

87.6% — up from 80.8% on Opus 4.6, and ahead of GPT-5.4 and Gemini 3.1 Pro.

Q: What's the new max image input size?

2,576px on the long edge — roughly 3.75 megapixels, more than triple the previous limit.

Q: What does /ultrareview do?

Dispatches four specialist sub-agents (security, logic, performance, style) in parallel, synthesizes findings into one report.

Q: What sits between ‘high’ and ‘max’ effort?

xhigh — a new reasoning tier for agentic workloads that balances quality and latency.

Q: What are Task budgets?

A public-beta feature that lets you cap total tokens for an agentic run so long-running jobs stay predictable.

Q: What's the tool-error improvement?

Roughly one-third the tool errors of Opus 4.6 on long agentic runs — the biggest real-world reliability bump.

Pricing & availability

Here's the part that made this review easy: pricing is identical to Opus 4.6.

Input tokens

per 1M tokens

Output tokens

$25

per 1M tokens

Prompt caching

90% off

on cached reads

Available across every major surface: Claude.ai (free/Pro/Max/Team), the Anthropic API under claude-opus-4-7, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. GitHub Copilot added the model to its chat picker the same day.

For Claude Code users, upgrading is one line:

claude --model claude-opus-4-7

Or in your project's .claude/settings.json:

{
  "model": "claude-opus-4-7",
  "defaultEffort": "xhigh"
}

Anthropic pricing page showing Claude Opus 4.7 $5 input and $25 output per million tokens — Pricing page on anthropic.com — Opus 4.7 matches 4.6 across input, output, and caching.

Claude Opus 4.7 by the numbers — 87.6% SWE-bench, 3x Rakuten, 2,576px vision, $5 input/$25 output — Opus 4.7 in four numbers.

Pros and cons

Strengths

✓ Long-horizon reliability. 1/3 the tool errors of 4.6 on 40+ step runs.
✓ Vision that actually works. 2,576px input unlocks real design-to-code and diagram reasoning.
✓ /ultrareview catches real bugs. Cross-file logic and security issues single-pass reviewers miss.
✓ Free upgrade. Same price as 4.6 — no excuse not to switch.
✓ Task budgets are a safety net. Finally lets you run agentic jobs without bill anxiety.

Weaknesses

× Context still capped at 200K. Gemini 3.1 Pro's 1M+ window is still unmatched for whole-repo research.
× /ultrareview is slow. 3-5 min on a 2K-line diff. Not a replacement for fast lint/CI checks.
× Max effort is overkill 95% of the time. Use xhigh as default — max triples latency for marginal gain.
× Still no native video input. Vision is image-only — Gemini still leads on video understanding.
× AI Design Tool didn't ship. Hinted at in pre-launch briefings, now listed as "pending."

Claude model overview docs listing claude-opus-4-7 alongside Sonnet and Haiku — The official docs — where to find the full model spec and API reference.

Verdict — who should upgrade

If you're using Opus 4.6 in Claude Code, Cursor, or any agentic pipeline, upgrade to 4.7 today. The price is identical, the tool-error rate is a third of what it was, and /ultrareview alone is worth the switch for anyone shipping to production.

If you're still on Sonnet 4.6 for cost reasons, Opus 4.7 is still 5x more expensive on output — but with xhigh effort and Task budgets you can finally use it for the hard 10% of your workload without blowing your budget on the easy 90%.

The one group that shouldn't rush: if you're crunching multi-hundred-page documents in one shot, stay on Gemini 3.1 Pro for now. 1M+ context still beats a smarter model with a smaller window.

Our take: Opus 4.7 is the first post-launch model in the 4.x line where we stopped double-checking every tool call. That's the quietly important shift — not the SWE-bench headline, but the fact that we finally trust it on a 40-step run.

Quick check: when to use what

Q: Which model for agentic coding?

Opus 4.7 with xhigh effort — current leader on SWE-bench Pro and tool-use reliability.

Q: Which model for 500-page documents?

Gemini 3.1 Pro — 1M+ context still beats a smarter model with a 200K window.

Q: Opus 4.7 or Sonnet 4.6 for everyday use?

Sonnet 4.6 for 90% of tasks. Switch to Opus 4.7 for complex planning, review, or agentic runs.

Q: When should I use max effort?

Only for the hardest debugging or planning tasks. It triples latency for marginal quality gain.

Q: Best way to control /ultrareview cost?

Set a Task budget (e.g. 200K tokens) before running. Caps parallel-agent spend automatically.

Q: Does Opus 4.7 support video input?

No — images only (up to 2,576px). For video, Gemini remains the better option.

Related Articles

Are You Building an AI Tool? Get your tool listed in front of thousands of developers, creators, and businesses searching for AI solutions

Submit Your AI Tool — Free Listing →

Frequently Asked Questions

❓ When was Claude Opus 4.7 released?

April 16, 2026. It's available across Claude.ai, the Anthropic API (claude-opus-4-7), Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry from day one.

❓ How much does Claude Opus 4.7 cost?

$5 per million input tokens and $25 per million output tokens — exactly the same as Opus 4.6. Prompt caching cuts cached-read cost by 90%. Batch API adds a further 50% discount for non-urgent jobs.

❓ What is /ultrareview and how does it differ from regular code review?

It's a slash command in Claude Code that spawns four specialist agents in parallel — security, logic, performance, style — each reading your diff with its own system prompt. Findings are synthesized into one ranked report. It catches cross-file bugs that single-pass review consistently misses.

❓ Is Claude Opus 4.7 better than GPT-5.4?

For agentic coding and tool-use reliability, yes — it leads SWE-bench Pro 64.3% to 57.7%. For raw reasoning on the hardest math or logic puzzles, GPT-5.4 remains competitive. For long-context document work, neither beats Gemini 3.1 Pro.

❓ What is the xhigh effort level for?

It's a new reasoning tier between high and max. We recommend it as the default for agentic workloads — it's noticeably more deliberate than high without the 3x latency penalty of max.

❓ Can Claude Opus 4.7 browse the web?

Not natively, but its computer-use and tool-use capabilities mean you can wire it up to a browser via MCP or a custom tool. The 2,576px vision input makes browser-based agents far more reliable.

❓ Should I upgrade from Opus 4.6 today?

Yes. Same price, meaningfully better reliability on long agentic runs, better vision, and /ultrareview. Unless you have a pinned evaluation pipeline, change model to claude-opus-4-7 today.

Claude.ai chat interface where Opus 4.7 is available for Pro, Max and Team users — Claude.ai — Opus 4.7 is live in the model picker for Pro, Max, and Team plans.

For AI Builders

Get Your AI Tool in Front of 50,000+ Monthly Readers

PopularAiTools.ai reaches developers, founders, and AI buyers actively searching for their next tool.

50K+

Monthly Visitors

1,000+

Tools Listed

8,500+

AI Resources

Submit Your AI Tool →