Context Compaction Test Article

Lead

On March 13, 2026, Anthropic announced that Claude’s 1M token context window was “now generally available” at “standard pricing, with no special pricing tier, no beta header, no asterisks.” Three weeks later, developers discovered their bills had doubled.

The disconnect reveals a hidden truth about AI economics: large context windows are expensive, and someone has to pay. For developers, that “someone” is increasingly themselves.

The Harvey Tweet That Started It

On March 30, 2026, developer Harvey (yorkeccak) tweeted: “Opus 4.6 1Mil context now billed as extra usage… Guess we are back to /compact-ing our way through life.”

Why Context Costs Money

Large language models don’t just process input — they store it. The “KV cache” (key-value cache) holds the intermediate state for every token in your conversation.

Seven Methods of Context Compression

1. LLM Summarization (70-90% compression)

The most widely deployed approach. An LLM rewrites conversation history into organized sections.

2. Opaque Compression (variable compression)

OpenAI’s Codex uses server-side compression via /responses/compact.

What This Means for Developers

The pricing confusion is a symptom of a deeper reality: AI companies are still figuring out how to price large context windows.

Sources

GitHub Issue #29289: “Max plan Extra Usage charges during usage reporting outage” (March 2026)
Anthropic Blog: “1M context is now generally available for Opus 4.6 and Sonnet 4.6” (March 13, 2026)

Context Compaction Test Article

Lead

The Harvey Tweet That Started It

Why Context Costs Money

Seven Methods of Context Compression

1. LLM Summarization (70-90% compression)

2. Opaque Compression (variable compression)

What This Means for Developers

Sources

More from Technology & People

The Quantum AI Breakthrough Nobody Saw Coming: What China's Origin Wukong Means

The Open Model Wars Heat Up: America's New Champion

Daily Technology: April 5, 2026