Claude Opus 4.7 for software engineering

Claude Opus 4.7 for software engineering

Anthropic's latest high-end model, Opus 4.7, promises game-changing behavior, but has some significant regressions.

by on

Anthropic just dropped Opus 4.7, to negative reception. Opus 4.5 was a gamechanger, and Opus 4.6 had some meaningful context and agentic improvements. Anthropic positioned 4.7 as the model you can hand off hard work to and walk away from.

Even though the benchmarks are looking strong, users are reporting that this is actually a huge step backwards, so engineering teams should pay attention before migrating.

What Anthropic is claiming

One of the biggest advancements for Opus 4.7 is that it is supposed to follow through on complex tasks and self-verify before reporting back. Anthropic also boasts that Opus 4.7 will run proofs on systems code prior to starting a task. This should make it able to work longer unsupervised, since it’ll be less likely to go off track.

Reports from Claude Code users

Users relying on CLAUDE.md or custom system prompts are reporting that 4.7 ignores them. They’ve also found that the model invents web searches, makes up packages that don’t exist, and even fabricates context mid-conversation.

Another recurring complaint is that the model tries to “wrap up” or “pick this up later”, during tasks where 4.6 would keep working. For long-running agentic tasks, this is a major regression.

Users also mention that 4.7 is more agreeable than 4.6, which leads to the model validating wrong approaches and going through with them.

Issues with instruction following

Anthropic claimed that 4.7 is even more thorough and takes prompts more literally. However, users complain that this model ignores instructions more often. This is likely due to 4.7 being highly sensitive to how prompts are written, so some successful 4.6 prompts can break it in unpredictable ways. You should expect to re-tune your prompts before using 4.7 in high-impact workloads.

Changes to token usage

Opus 4.7 comes with a new xhigh effort level (a level between high and max). Claude Code’s default effort has been bumped to xhigh.

From Anthropic’s internal testing, the token picture is net favorable; they found more work is done per token at the same effort levels. Unfortunately, users are burning through limits faster for output that doesn’t feel any better.

Here’s how to keep track of your CC token usage.

New Claude Code features worth knowing

These ship alongside the model:

  • Pro and Max users get a free trial of /ultrareview, which thoroughly reads through code changes and flags bugs
  • Claude auto mode handles permission decisions for longer tasks, which reduces interruptions
  • Developers can use task budgets to guide Claude’s token spend across longer runs

Should you migrate to 4.7?

For most engineering teams, Opus 4.6 is still the smarter choice. It’s much more stable, and your existing prompts and workflows are already tuned to it. Anthropic does deprecate older models, but based on the feedback for 4.7, it’s unlikely to happen anytime soon.

If you do want to test 4.7, you might want to use it for a smaller test task. Prompts don’t necessarily carry over, and the token spend is higher, so you should take notes and measure how task inputs/outputs compare. Anthropic has a migration guide that’s worth reading if you’re looking into this.

Environments for Claude Code

Claude Code works best when you give it access to on-demand preview environments so it can validate the code it writes. Your agent can push code to environments, view changes live, pull logs, and run tests. This means you won’t need to supervise your agent as much, since it can self-check to make sure things are on the right track.

Shipyard makes these workflows easy, for you and for your agent. Try it free today for 30 days, and watch your agents ship higher-quality code.

Try Shipyard today

Get isolated, full-stack ephemeral environments on every PR.

About Shipyard

Shipyard manages the lifecycle of ephemeral environments for developers and their agents.

Get full-stack review environments on every pull request for dev, product, agentic, and QA workflows.

Stay connected

Latest Articles

Shipyard Newsletter
Stay in the (inner) loop

Hear about the latest and greatest in cloud native, agents, engineering, and more when you sign up for our monthly newsletter.