Anthropic just surprise-launched Claude Opus 4.7 — and it’s the most significant Opus upgrade since the Claude 4 series debuted. The model is available now across all Claude products, the API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.
No预告. No countdown. Just dropped.
🔍 The Bottom Line
Claude Opus 4.7 is a direct upgrade to Opus 4.6 with major gains in coding, long-running autonomy, and vision. It handles hours-long tasks with minimal supervision, verifies its own outputs before reporting back, and introduces a new “xhigh” effort level. Pricing stays the same: $5/M input, $25/M output.
⚡ What’s New
1M context window. Opus 4.7 can process up to one million tokens in a single conversation — roughly 750,000 words. That’s an entire codebase in context.
Autonomous verification. The model doesn’t just execute tasks — it devises ways to check its own work before returning results. In testing, it autonomously built a complete Rust text-to-speech engine from scratch, then fed its output through a speech recognizer to verify it matched the Python reference.
xhigh effort level. A new reasoning tier between “high” and “max” gives developers finer control over the tradeoff between thinking depth and latency. Claude Code now defaults to xhigh.
3.75 megapixel vision. Opus 4.7 accepts images up to 2,576 pixels on the long edge — more than three times the resolution of prior Claude models. Computer-use agents can now read dense screenshots and complex diagrams.
Task budgets. A new API feature lets developers guide Claude’s token spend across long runs — crucial for managing costs on multi-hour autonomous workflows.
💻 Coding: The Big Jump
The coding improvements are where Opus 4.7 shines brightest:
- 13% lift on Anthropic’s 93-task internal coding benchmark over Opus 4.6
- 70% on CursorBench versus Opus 4.6’s 58%
- 3x more production tasks resolved on Rakuten-SWE-Bench
- 14% improvement on multi-step agentic workflows at fewer tokens
- One-third the tool errors compared to Opus 4.6
Early testers report that low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6. It catches its own logical faults during planning, cuts meaningless wrapper functions and scaffolding, and fixes its own code as it goes.
“It’s the cleanest jump we’ve seen since the move from Sonnet 3.7 to the Claude 4 series.” — Early tester feedback
🛡️ Cybersecurity Safeguards
Opus 4.7 is the first model released with Anthropic’s new automated cybersecurity safeguards — a direct result of the Project Glasswing announcement last week.
The safeguards automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses. Anthropic experimented with differentially reducing cyber capabilities during training — and while Opus 4.7’s cyber capabilities are less advanced than Mythos Preview, the real-world deployment data from these safeguards will inform how Anthropic eventually releases Mythos-class models more broadly.
For legitimate security professionals, Anthropic launched a Cyber Verification Program for vulnerability research, penetration testing, and red-teaming.
On XBOW’s autonomous penetration testing benchmark, Opus 4.7 scored 98.5% on visual acuity versus Opus 4.6’s 54.5% — effectively eliminating the biggest pain point for computer-use security work.
🧠 What It Means for Agentic Work
The biggest story isn’t benchmarks — it’s autonomy. Opus 4.7 was designed for the shift from humans working 1:1 with AI agents to managing them in parallel:
- Loop resistance — it doesn’t get stuck repeating the same approach
- Graceful error recovery — it pushes through tool failures that used to stop Opus 4.6 cold
- Long-horizon coherence — it works coherently for hours without losing context
- File system memory — it remembers important notes across multi-session work
Devin reported that Opus 4.7 “works coherently for hours, pushes through hard problems rather than giving up, and unlocks a class of deep investigation work we couldn’t reliably run before.”
📊 Safety Profile
Opus 4.7 shows a similar safety profile to Opus 4.6 with some improvements: better honesty and resistance to prompt injection attacks. On some measures (like harm-reduction advice on controlled substances) it’s modestly weaker. Anthropic’s alignment assessment: “largely well-aligned and trustworthy, though not fully ideal.”
Mythos Preview remains Anthropic’s best-aligned model — but it’s still limited-release.
💰 Token Usage: Watch Your Bill
Two changes affect costs. First, Opus 4.7 uses an updated tokenizer — the same input can map to 1.0–1.35× more tokens depending on content. Second, it thinks more at higher effort levels, especially on later turns in agentic settings.
The net effect is favorable on coding tasks (better results per token), but Anthropic recommends measuring the difference on real traffic. Use the effort parameter, task budgets, or prompt conciseness to manage spend.
🔍 THE BOTTOM LINE
Claude Opus 4.7 isn’t just an incremental update — it’s the model that makes unsupervised, hours-long AI work practical. The combination of self-verification, loop resistance, and the xhigh effort level means you can genuinely hand off your hardest tasks and trust the result.
If you’re a developer working with AI agents, this is the upgrade that changes how you work. If you’re watching the AI race, this is Anthropic’s answer to GPT-5.4 — and on coding benchmarks, it’s winning.
The real question: what happens when Mythos Preview gets these safeguards and goes wide?
Sources: