Code Compression
Technology & People

Context Compaction Test Article

Testing the article generator with extracted content from an existing article.

Lead

On March 13, 2026, Anthropic announced that Claude’s 1M token context window was “now generally available” at “standard pricing, with no special pricing tier, no beta header, no asterisks.” Three weeks later, developers discovered their bills had doubled.

The disconnect reveals a hidden truth about AI economics: large context windows are expensive, and someone has to pay. For developers, that “someone” is increasingly themselves.

The Harvey Tweet That Started It

On March 30, 2026, developer Harvey (yorkeccak) tweeted: “Opus 4.6 1Mil context now billed as extra usage… Guess we are back to /compact-ing our way through life.”

Why Context Costs Money

Large language models don’t just process input — they store it. The “KV cache” (key-value cache) holds the intermediate state for every token in your conversation.

Seven Methods of Context Compression

1. LLM Summarization (70-90% compression)

The most widely deployed approach. An LLM rewrites conversation history into organized sections.

2. Opaque Compression (variable compression)

OpenAI’s Codex uses server-side compression via /responses/compact.

What This Means for Developers

The pricing confusion is a symptom of a deeper reality: AI companies are still figuring out how to price large context windows.

Sources

  • GitHub Issue #29289: “Max plan Extra Usage charges during usage reporting outage” (March 2026)
  • Anthropic Blog: “1M context is now generally available for Opus 4.6 and Sonnet 4.6” (March 13, 2026)