AI-generated·Learn how
© Publico
AI & Tech·yesterday

Anthropic's Claude Opus 4.8 Is 4x More Honest, Adds Cheaper Fast Mode and Parallel Subagents

Anthropic's Claude Opus 4.8 promises fewer silent failures, a dramatically cheaper fast mode, and the ability to spawn hundreds of parallel agents.

Improved Honesty

Anthropic released Claude Opus 4.8 on Thursday, emphasizing the model's "honesty" as its standout feature. The company says it is around four times less likely than its predecessor, Opus 4.7, to allow flaws in its own code to go unremarked. The model is more likely to flag uncertainties and less prone to making unsupported claims, addressing a persistent issue where AI models project confidence even when wrong.

Claude Opus 4.8 has noticeably better judgment. In Claude Code, it asks the right questions, catches its own mistakes, pushes back when a plan isn't sound, and builds up confidence around complex, multi-service explorations before making big changes.

Benchmark Performance

On standard benchmarks, Opus 4.8 delivers modest but tangible improvements. It scores 88.6% on SWE-bench Verified (up from 87.6%), 69.2% on agentic coding (Terminal-Bench 2.1, up from 64.3%), and 57.9% on multidisciplinary reasoning with tools. It also tops Vals AI's vibe coding benchmark by 10% over the previous Anthropic model, outperforming all publicly available AIs.

Opus 4.8 Benchmark Scores · %
SWE-bench Verified
88.6 %
Terminal-Bench 2.1
69.2 %
Multidisciplinary Reasoning
57.9 %

New Features

Alongside the model, Anthropic introduced dynamic workflows in research preview, allowing Claude to plan tasks and spawn hundreds of parallel subagents within a single session. Users can now control the "effort" level of responses, choosing between faster, cheaper replies and deeper, more token-intensive reasoning. Fast mode has become three times cheaper, dropping to $10 per million input tokens and $50 per million output tokens, while regular pricing remains $5/$25.

Alignment and Mythos

Anthropic assessed Opus 4.8's prosocial traits, finding it comparable to its most aligned model, Claude Mythos Preview. The company confirmed that Mythos-class models, which have already found over 10,000 critical software vulnerabilities, will be released to all customers "in the coming weeks" once additional safeguards are in place.

Opus 4.8 is around 4x less likely than its predecessor to allow flaws in code it's written to pass unremarked.

Anthropic blog post

Market Impact

The launch comes amid fierce competition with OpenAI and Google. Early testers like Cursor, Cognition, and Databricks reported improvements in tool usage and agentic reasoning. Databricks noted a "step change in agentic reasoning" and 61% cheaper token costs compared to Opus 4.7 for its Genie data agent.

8 sources

More from Society & Science
Cape Canaveral · Cocoa Beach
Leipzig