Anthropic quietly degraded Claude Code's reasoning depth without telling users
Technology & People

Anthropic Quietly Turned Down Claude's Thinking — And Got Caught by 7,000 Sessions of Data

Anthropic quietly reduced Claude Code's thinking effort from high to medium and hid reasoning from logs. An AMD director proved it with 7,000 sessions of data. Most users had no way to notice.

AnthropicClaude CodeAI codingmodel degradationtransparency

Anthropic Quietly Turned Down Claude’s Thinking — And Got Caught by 7,000 Sessions of Data

Anthropic silently reduced Claude Code’s default thinking effort from “high” to “medium” and hid reasoning content from session logs — without telling users. An AMD director proved the degradation with 6,852 sessions of hard telemetry. Anthropic has admitted to the changes.

🔇 What Happened

In early March 2026, Anthropic deployed Claude Code version 2.1.69 with two changes that weren’t prominently communicated:

  1. Thinking effort default dropped from “high” to “medium” — meaning Claude spends fewer reasoning tokens before responding
  2. Thinking content redaction enabled by default — a header that strips Claude’s internal reasoning from API responses, so users can’t see what the model was actually thinking

Separately, SDK version 0.2.68+ also silently overrode the effort default from “high” to “medium” for Sonnet 4.6, breaking agentic workflows.

Most users had no way to notice. They’d send a prompt, get a response, and never see that the reasoning behind it had been shallowed out. The outputs looked similar — just slightly worse in ways that were easy to chalk up to “AI being AI.”

Then one user tried something clever: they asked Claude to reveal its own reasoning effort setting. In extended thinking mode, Claude could see a reasoning_effort tag in its own system prompt — set to 25 out of 100. That’s not “medium” effort. That’s a quarter of the maximum thinking capacity, configured by Anthropic, invisible to users, at full price.

📊 The Smoking Gun

Stella Laurenzo, director of the AI group at AMD, didn’t rely on vibes. She filed a GitHub issue with data from 6,852 Claude Code sessions, encompassing 234,760 tool calls and 17,871 thinking blocks.

The numbers tell a clear story:

  • Stop-hook violations (catching laziness patterns like dodging ownership, premature cessation, permission-seeking): went from zero before March 8 to 10 per day on average through end of March
  • Code reads before editing: dropped from 6.6 reads on average to just 2
  • Full file rewrites vs targeted edits: full rewrites skyrocketed while targeted edits dropped
  • Thinking depth: the model’s reflection became visibly shallower after the changes

As Laurenzo put it: “When thinking is shallow, the model defaults to the cheapest action available: edit without reading, stop without finishing, dodge responsibility for failures, take the simplest fix rather than the correct one. These are exactly the symptoms observed.”

🤔 Why It Matters

This isn’t just about one AI coding tool getting a bit lazier. It’s about a fundamental transparency problem in the AI industry.

When a cloud-based AI service changes its model behavior — especially making it less capable — users typically have no way to detect it. The API still returns responses. The latency is similar. The format is the same. The quality just… degrades.

Most users don’t have 7,000 sessions of telemetry to prove what they’re experiencing. They just feel like the tool is “a bit off” and can’t articulate why. That asymmetry is the real story.

Laurenzo explicitly called this out: “The uncomfortable part is most users had no data to notice it happened at all.”

🛠️ The Workaround

Anthropic admitted to the changes and there is a fix: using /effort max forces Claude Code back to deep thinking. But this requires users to:

  1. Know the degradation happened (most didn’t)
  2. Know the workaround exists
  3. Manually apply it every session

Laurenzo also called for Anthropic to expose thinking token counts per request so users can monitor reasoning depth, and for a max-thinking tier for engineering workflows. “The current subscription model doesn’t distinguish between users who need 200 thinking tokens per response and users who need 20,000,” she wrote.

🚪 AMD Already Left

The most telling detail: Laurenzo’s team has already switched to another provider. “We have switched to another provider which is doing superior quality work,” she wrote, while declining to name the alternative due to NDAs.

Her parting warning for Anthropic: “6 months ago, Claude stood alone in terms of reasoning quality and execution. But the others need to be watched and evaluated very carefully. Anthropic is far from alone at the capability tier that Opus previously occupied.”

That’s not trash talk. That’s a paying enterprise customer explaining exactly why they left.

⚖️ The Bigger Picture

This incident sits at the intersection of several industry trends:

  • Opaque model updates: Cloud AI services can change model behavior at any time without notification. There’s no version pinning, no changelog, no way to audit what changed
  • Cost optimization vs quality: Reducing thinking effort saves compute costs. If users can’t detect the degradation, the incentive structure favors quiet downgrades
  • Competitive pressure: With Cursor, Windsurf, Google’s Gemini Code Assist, and others heating up the AI coding market, Anthropic can’t afford to lose its reputation for reasoning quality
  • Enterprise trust: When AMD’s AI director publicly says your product can’t be trusted for complex engineering tasks, that reverberates across every enterprise evaluating AI coding tools

Anthropic has had a rough stretch. The Claude Code source code was recently leaked. Users complained about unexplained token usage surges pushing them past limits. And now this — a silent quality reduction that was only caught because one user happened to be logging everything.

🔍 The Bottom Line

Anthropic quietly made Claude Code think less and hid the evidence. An AMD director with 7,000 sessions of data caught them. Anthropic admitted it. The workaround exists (/effort max), but most users never knew they needed it. This is the core problem with cloud AI: providers can degrade your tools and you might never notice. The only defense is telemetry — logging your own usage data so you can spot regressions. Expect enterprise customers to start demanding thinking token transparency, SLA guarantees on reasoning depth, and contractual commitments to disclose model behavior changes. The era of “trust us, it’s still smart” is over.


Sources

  • The Register: “Claude Code has become dumber, lazier: AMD director” (April 6, 2026)
  • GitHub Issue #31480: anthropics/claude-code — “Opus 4.6 quality regression”
  • GitHub Issue #214: anthropics/claude-agent-sdk-typescript — “SDK 0.2.68+ silently overrides effort default”
  • Times of India: “AMD’s AI director is not happy with Anthropic’s Claude code” (April 10, 2026)
Sources: The Register, GitHub (anthropics/claude-code, GitHub (anthropics/claude-agent-sdk-typescript