Claude Opus 4.7 Launches With 87.6% Coding Score, Takes Aim at GPT-5.4

Claude Opus 4.7 Launches With 87.6% Coding Score, Takes Aim at GPT-5.4

Anthropic has rolled out Claude Opus 4.7 as its latest generally available model, and the release lands with a clear message for developers, enterprise users, and AI watchers: the company is pushing harder into the race for real-world coding, long-running task execution, and higher-value professional work. The new model is positioned as a direct upgrade to Opus 4.6, but the bigger story is not just that it is newer. It is that Anthropic says Opus 4.7 is better at handling difficult software engineering tasks with less supervision, stronger instruction-following, sharper visual understanding, and more reliable output across extended workflows.

That matters because the AI market has moved beyond simple chatbot comparisons. Buyers now want models that can manage multi-step work, hold context across longer sessions, review their own outputs, and produce material that looks usable in a real business setting. In that context, Claude Opus 4.7 arrives as a model built not only to answer questions, but to take on demanding tasks in coding, finance, design, and documentation with more consistency than its predecessor.

Anthropic’s own benchmark table shows some of the clearest gains in coding-related evaluations. On Anthropic’s official announcement, Opus 4.7 posted 64.3% on SWE-bench Pro, up from 53.4% for Opus 4.6. On SWE-bench Verified, it reached 87.6%, compared with 80.8% for the earlier version. On Terminal-Bench 2.0 for agentic terminal coding, it recorded 69.4%, again ahead of Opus 4.6 at 65.4%. These are the kinds of numbers developers and technical buyers will notice because they point to more than raw intelligence. They suggest a model that is becoming more dependable on difficult engineering work where errors, missed steps, and loose instruction-following can quickly reduce trust.

The user feedback Anthropic highlighted reinforces that theme. Early testers said Opus 4.7 catches logical faults during planning, accelerates execution, and performs better on hard coding tasks that previously still needed close oversight. Another tester described it as particularly strong in async workflows, automations, CI/CD, and long-running work. That language is important because it moves the story away from demo-friendly AI and toward systems that can support production-style tasks inside real teams.

Claude Opus 4.7 pushes harder into coding, finance, and multimodal work

The strongest traffic angle in this release is coding, but Anthropic is also framing Opus 4.7 as a broader professional model. The company says the model has substantially better vision, supporting images up to 2,576 pixels on the long edge, or roughly 3.75 megapixels. That is more than three times the visual resolution supported by prior Claude models, opening the door for tasks that depend on dense screenshots, detailed diagrams, and pixel-level references. For agent workflows, that could make a noticeable difference in reading interfaces, analyzing visual information, and working across software environments where small details matter.

Anthropic also says Opus 4.7 performs better in finance-related work. In the benchmark table, the model scored 64.4% on Finance Agent v1.1, ahead of Opus 4.6 at 60.1%, GPT-5.4 at 61.5%, and Gemini 3.1 Pro at 59.7%. The company further claims that internal testing showed stronger finance analysis, more rigorous modeling, more professional presentations, and tighter task integration. That gives the launch a wider business angle than a pure developer story. For enterprise buyers, it suggests Anthropic is targeting knowledge work that extends beyond code generation into higher-value analytical and presentation tasks.

Another notable claim is around memory. Anthropic says Opus 4.7 is better at using file system-based memory, allowing it to carry forward important notes across long, multi-session work. That may sound like a small technical detail, but it speaks directly to a common complaint around AI productivity tools: too much repeated context-setting. A model that can retain and apply useful working memory across sessions can reduce friction and make longer projects feel more practical.

The release also includes platform-side updates that make the launch more commercially interesting. Anthropic introduced a new xhigh effort level between high and max, giving users finer control over the balance between reasoning depth and latency. On the API side, it is launching task budgets in public beta, aimed at helping developers manage token spend across longer-running tasks. In Claude Code, Anthropic is adding a new /ultrareview command for more dedicated code review sessions, while also extending auto mode to Max users so longer tasks can run with fewer interruptions.

Pricing stays the same, but the migration story comes with trade-offs

For developers already using Opus 4.6, Anthropic is presenting 4.7 as a direct upgrade, though not one without operational considerations. Pricing remains unchanged at $5 per million input tokens and $25 per million output tokens, which removes one obvious barrier to adoption. The model is available across Claude products and through the API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry, giving the release broad distribution from day one.

Still, Anthropic says users should plan for token usage changes. Opus 4.7 uses an updated tokenizer, meaning the same input may map to roughly 1.0x to 1.35x more tokens depending on content type. The model also tends to think more at higher effort levels, especially in agentic settings, which can increase output token usage. Anthropic argues that the overall effect is favorable in its internal coding evaluation, but for teams running high-volume workloads, cost behavior in real traffic will matter as much as benchmark performance.

There is also a security dimension to this launch. Anthropic recently discussed the risks and benefits of increasingly capable AI systems in cybersecurity, and it says Opus 4.7 is the first model being released with new safeguards designed to detect and block prohibited or high-risk cyber requests. The company says the model is less cyber-capable than Mythos Preview and that it experimented during training with reducing those capabilities. For legitimate security professionals, Anthropic is opening a Cyber Verification Program covering uses such as red-teaming, penetration testing, and vulnerability research.

That may not be the most clickable part of the launch, but it does matter for trust. As model capability increases, large buyers want to know not only what a model can do, but also where the provider is drawing boundaries. Anthropic is clearly trying to show that it can keep improving performance while still managing release risk around more sensitive use cases.

The broader market takeaway is that Claude Opus 4.7 looks like a meaningful step forward in the segment that matters most right now: AI that can do serious work rather than simply talk about it. Its benchmark improvements over Opus 4.6, especially in software engineering, its stronger finance and multimodal claims, and its focus on long-running, self-checking workflows all point to a model designed for practical deployment. The headline is not just that Anthropic has released another model. It is that the competition around coding agents, professional AI tools, and enterprise-grade automation is becoming more intense, and Claude Opus 4.7 has entered that fight with sharper numbers and a more concrete real-world pitch.

Add Swikblog as a preferred source on Google

Make Swikblog your go-to source on Google for reliable updates, smart insights, and daily trends.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *