Server racks in a data center representing the backend infrastructure behind Anthropic's silent cache TTL change
Technology & People

Anthropic Slashed Claude Code's Cache TTL From 1 Hour to 5 Minutes — Users Are Paying the Price

Anthropic silently cut Claude Code's cache TTL from 1 hour to 5 minutes. Users with 120K API calls of data prove they're overpaying by up to 25% — and hitting subscription quotas in under an hour.

AnthropicClaude CodeAI pricingprompt cachingtransparency

Anthropic silently changed Claude Code’s prompt cache time-to-live from 1 hour to 5 minutes in early March 2026. No announcement. No changelog. No warning. Users who dug into their session logs discovered they’ve been overpaying by 17–25% — and hitting subscription quotas they never reached before.

This is the second silent degradation Claude Code users have caught in a month. The first was a thinking effort reduction from “high” to “medium.” Now it’s the cache. Two different changes, two different mechanisms, one pattern: Anthropic changes things under the hood and doesn’t tell the people paying the bill.


💾 WHAT PROMPT CACHING IS — AND WHY TTL MATTERS

When you use Claude Code, the system caches your conversation context — your codebase summary, your instructions, your prior messages — so it doesn’t have to reprocess everything from scratch on each turn. This is prompt caching, and it’s the reason long coding sessions are affordable at all.

Cache writes are expensive. Cache reads are cheap. For Sonnet 4.6, writing to cache costs $3.75 per million tokens. Reading from cache costs $0.30 — 12.5× cheaper. For Opus, the write rate is $6.25/MTok vs $0.50/MTok read.

The TTL — time-to-live — determines how long that cached content stays warm. If the TTL is 1 hour and you send another message within 60 minutes, you get a cheap cache read. If the TTL is 5 minutes and you step away for a coffee break, your entire cached context expires. On your next turn, every token gets re-uploaded as a fresh cache write at full price.

For long coding sessions — the primary Claude Code use case — this creates a compounding penalty. The longer your session, the more context you have cached, and the more expensive each cache expiry becomes.


📊 THE DATA: 119,866 API CALLS DON’T LIE

GitHub user seanGSISG analyzed 119,866 API calls across two independent machines (Linux workstation + Windows laptop, different accounts) from January 11 to April 11, 2026. The data comes directly from Claude Code’s own session logs — ~/.claude/projects/**/*.jsonl files — which include per-message usage breakdowns with ephemeral_5m_input_tokens and ephemeral_1h_input_tokens fields.

The timeline is unambiguous:

PeriodDatesBehaviorEvidence
Phase 1Jan 11–315m only1h tier not yet available
Phase 2Feb 1–Mar 51h onlyZero 5m tokens across 33+ consecutive days on both machines
Phase 3Mar 6–7TransitionFirst 5m tokens reappear
Phase 4Mar 8–Apr 115m dominant5m tokens surge; 1h becomes minority or disappears

March 6 is when 5m tokens first reappear after 33 days of clean 1h-only behavior across two independent accounts. By March 8, 5m tokens outnumber 1h by 5:1.


💰 THE COST: $949 OVERPAID (AT MINIMUM)

Using official Anthropic pricing, the analysis quantifies the damage for Sonnet 4.6:

MonthActual CostCost with 1h TTLOverpaid% Waste
January$78.99$37.54$41.4552.5%
February$1,120.43$1,108.11$12.321.1%
March$2,776.11$2,057.01$719.0925.9%
April$1,193.01$1,016.78$176.2314.8%
Total$5,561.17$4,612.09$949.0817.1%

February — the month with 1h TTL default — shows just 1.1% waste. Every other month shows 15–53% overpayment from 5m cache re-creations. The percentages are identical across Sonnet and Opus tiers because the waste is driven purely by the 5m/1h token split, not per-token pricing.

For Opus users, the total overpayment comes to $1,581.80 across the same period.

Over the three months: 220 million tokens written to the 5m tier, generating 5.7 billion cache reads. Had those 220M tokens been on the 1h tier, re-accesses within the hour would have been reads instead of re-creations.


⏱️ THE QUOTA CRISIS

Cost is only half the story. Pro and Max subscription users are quota-limited — and cache creation tokens count toward quota at full rate.

Multiple users report that before March, they never hit their 5-hour quota limit. After the TTL change, sessions that previously consumed an hour of quota now burn through it in 20 minutes. One user described getting “about 4 hours of real use” out of a 24-hour day — because they exhaust their quota, wait 4 hours for reset, and repeat.

This is the mechanism: shorter TTL → more cache writes → more quota consumption → hitting limits faster → unable to work. The cost is denominated in dollars for API users and in lost hours for subscription users.


🏢 ANTHROPIC’S RESPONSE: “IT’S CHEAPER, ACTUALLY”

Anthropic’s Jarred Sumner (of Bun fame, now at Anthropic) responded to the issue with a detailed explanation that amounts to: the change was intentional, and it actually saves users money overall.

The argument: not all Claude Code requests benefit from 1h TTL. Subagent calls and one-shot requests that won’t be revisited are cheaper with 5m TTL because 5m cache writes cost less than 1h writes (roughly 1.25× base input vs 2×). If you write to cache and never read it back, the cheaper write rate wins.

Sumner also disclosed that the client now selects TTL per request based on expected cache-reuse patterns — main conversation turns (likely to be revisited) get 1h, subagent calls (typically one-shot) get 5m. The March 6 change was part of this optimization.

A separate bug in v2.1.90 could cause sessions that hit quota limits to stay on 5m TTL until the session exited — fixed in that version.

Independent data from another user (@spm1001, 407K API turns) corroborates the per-request selection pattern: main turns mostly stay on 1h (0–6% 5m after the bug fix), while subagent turns are 100% 5m.


🤨 WHY USERS AREN’T BUYING IT

Anthropic’s explanation is technically coherent but leaves several problems unaddressed:

1. No advance notice. A change that materially affects costs and quota consumption was deployed without any announcement. Users discovered it by auditing their own logs.

2. The quota problem remains. Whether per-request selection is “optimal” in aggregate doesn’t matter to users who can no longer finish a work session. As one user put it: “User has already paid the monthly subscription cost. The money is already in Anthropic’s bank account. What we are seeing instead is we exhaust the session quota under an hour.”

3. No user control. Users can’t choose their own TTL. A configurable option — 5m, 15m, 1h — would let power users optimize for their own usage patterns. Anthropic says 1h everywhere would increase costs, but that’s a statement about the average, not about every user.

4. Trust erosion. This is the second silent backend change in a month. The thinking effort degradation was caught by an AMD director with 7,000 sessions of data. This one was caught by a user with 120K API calls. Both required extraordinary diligence to detect.

As one commenter put it: “Jarred’s response was very informative and very transparent, but came at the wrong time. Instead of having this answer as a post-mortem, it should have been a pre-mortem.”


🔗 THE PATTERN: ANTHROPIC’S TRANSPARENCY PROBLEM

These incidents are not isolated. They form a pattern:

  • Thinking effort reduction — silently changed from “high” to “medium” in March
  • Thinking content redaction — reasoning hidden from logs by default
  • Cache TTL change — 1h to 5m, no announcement
  • Quota counting opacity — users still don’t know exactly how cache reads count toward quota (#45756 remains unresolved)

Each change was technically justified. Each was deployed without telling users. Each was only caught because someone was logging everything.

The common thread isn’t that Anthropic is malicious — it’s that the default posture is “change first, explain later.” For a company whose stated mission is AI safety and responsible development, the opacity around changes that affect customer costs is jarring.


🛠️ WHAT USERS CAN DO

If you’re a Claude Code user, there are mitigations:

  • Keep sessions short and focused — one task per session reduces the impact of cache expiry
  • Compact conversations before stepping away/compact reduces the context that needs re-caching
  • Front-load critical context in CLAUDE.md — cache creation tokens spent on high-value content at least deliver ROI
  • Upgrade to v2.1.90+ — fixes the bug that kept quota-exhausted sessions on 5m TTL
  • Monitor your own usage — check ~/.claude/projects/ JSONL files for cache_creation token breakdowns

But the real ask is simpler: Anthropic needs to announce infrastructure changes that affect costs before deploying them. Not after. Not when someone files a GitHub issue with 120K data points. Before.


🔍 THE BOTTOM LINE

Anthropic changed Claude Code’s cache TTL from 1 hour to 5 minutes without telling anyone. Data from 119,866 API calls proves it happened on March 6, 2026. Anthropic confirmed the date and says it was an optimization — per-request TTL selection that saves money on average.

“On average” doesn’t help the user whose quota now expires in 20 minutes. The optimization may be real in aggregate, but individual users with long coding sessions — exactly the power users who pay for Max subscriptions — are measurably worse off. One user overpaid by $949 in three months.

This is the second silent change caught in a month. The thinking effort reduction. The cache TTL change. Two different mechanisms, one pattern: Anthropic deploys first and explains when caught. As one GitHub commenter put it: “Every change that touches consumption limits or has a probability to change how customers are billed — this must be announced well in advance.”

The fix isn’t technical. It’s cultural. Anthropic needs to treat cost-impacting infrastructure changes the same way they’d treat a model behavior change — with advance disclosure, clear documentation, and user control. The era of “trust us, it’s optimized” is over. Users have the logs. They’re watching.


SOURCES

  • GitHub Issue #46829: anthropics/claude-code — Cache TTL regression
  • GitHub Issue #45756: anthropics/claude-code — Pro Max quota exhaustion
  • Anthropic Prompt Caching Documentation (2026)
Sources: GitHub (anthropics/claude-code, GitHub (anthropics/claude-code