Server racks with glowing blue lights, data center corridor, documentary style, overhead shot
News

DeepSeek V4 Matches GPT-5.5 at 86% Less Cost — and It's 100% Open Source

DeepSeek V4 matches GPT-5.5 on benchmarks at a fraction of the cost, with a million-token context window and fully open weights. China's caught up.

DeepSeekOpen Source AIGPT-5.5China AILLM Benchmarks

Three Chinese AI labs released frontier open-source models in four days. The last one — DeepSeek V4 — just matched GPT-5.5 on benchmarks at 86% less cost. The open-source AI race has a new leader, and it’s not in San Francisco.

🔍 THE BOTTOM LINE: DeepSeek V4-Pro matches or beats GPT-5.4 and Claude Opus 4.6 on coding benchmarks, supports a native 1-million-token context window, and is fully open source. DeepSeek V4-Flash — the smaller model — runs at 10% of the inference cost of V3.2. This isn’t a near-miss. This is parity.


🏆 The Numbers That Matter

DeepSeek V4 comes in two models:

V4-ProV4-Flash
Total params1.6 trillion284 billion
Active params49 billion13 billion
Context window1 million tokens1 million tokens
Best forFrontier reasoningCost-effective serving
Codeforces rating3,206 (#23 among humans)
Open weights

On Codeforces, V4-Pro’s 3,206 rating puts it at 23rd among human competitive programmers. That’s the first time an open model has matched a closed frontier model on competitive programming.


📊 How It Compares

The benchmark table tells the story:

  • LiveCodeBench: V4-Pro-Max scores 93.5% — ahead of Claude Opus 4.6 (88.8%) and Gemini 3.1 Pro (91.7%)
  • Codeforces: 3,206 rating, beating GPT-5.4’s 3,168
  • SWE-Bench Verified: 80.6%, tied with Claude Opus 4.6 and Gemini 3.1 Pro
  • MMLU-Pro: 87.5% — close but behind Gemini 3.1 Pro (91.0%)

Where V4 trails: general knowledge (Gemini 3.1 Pro still leads on MMLU-Pro, SimpleQA, GPQA Diamond), long-context retrieval (Claude Opus 4.6 retains the crown on MRCR at 1M tokens), and agentic coding (GPT-5.5’s 82.7% on Terminal Bench vs V4’s 67.9%).

The gap to proprietary frontier models has narrowed to months, not years.


🧠 The Architecture Breakthrough

V4’s headline feature isn’t just benchmarks — it’s the attention mechanism. Standard transformer attention scales quadratically with context length, which makes 1-million-token contexts prohibitively expensive. DeepSeek solved this with a hybrid approach:

  • Compressed Sparse Attention (CSA): Compresses the KV cache by 4x, then uses a lightning indexer to retrieve the top-k most relevant entries for each query
  • Heavily Compressed Attention (HCA): Aggressive 128x compression with dense attention over the compressed representation — a cheap global view of distant tokens
  • FP4 quantization for MoE expert weights — halves memory vs FP8

The result at 1M tokens:

FLOPs vs V3.2KV Cache vs V3.2
V4-Pro27% (3.7x lower)10% (9.5x smaller)
V4-Flash10% (9.8x lower)7% (13.7x smaller)

A 10x smaller KV cache means roughly 10x more concurrent long-context sessions per GPU. This is what makes million-token agents economically viable.


💰 Pricing and Availability

DeepSeek hasn’t published final API pricing for V4 yet, but based on their history of undercutting OpenAI and Anthropic by 10-15x, expect:

  • V4-Flash: Significantly under $1/million input tokens
  • V4-Pro: Well under $5/million input tokens (vs GPT-5.5 at $5/$30 and Opus 4.7 at $5/$25)

Available now via:

  • API: deepseek-v4-pro and deepseek-v4-flash (OpenAI and Anthropic compatible)
  • Chat: chat.deepseek.com
  • Open weights: Hugging Face (DeepSeek collection)

V3 endpoints (deepseek-chat, deepseek-reasoner) retire July 24, 2026.


🇨🇳 Three Labs, Four Days

The timing matters. This week:

  • Monday: Moonshot drops Kimi K2.6
  • Wednesday: Alibaba drops Qwen 3.6-27B
  • Thursday: DeepSeek drops V4

Three Chinese labs, three frontier open-source models, in under four days. Combined with Huawei announcing Ascend 950 chip support for DeepSeek V4, China is building a self-contained AI stack: Chinese weights, Chinese chips, Chinese inference software.


⚠️ What’s Still Missing

DeepSeek is candid about limitations:

  • Multimodal: V4 is text-only. No images, audio, or video
  • Knowledge gap: Trails Gemini 3.1 Pro on knowledge-heavy benchmarks
  • Long-context ceiling: Retrieval accuracy degrades above 128K tokens (66% at 1M)
  • Agentic coding: GPT-5.5 and Opus 4.7 still lead on Terminal Bench and SWE Pro
  • Architecture complexity: DeepSeek calls V4 “relatively complex” and plans to simplify future versions

🔍 THE BOTTOM LINE

The question isn’t whether open-source models can compete at the frontier anymore. DeepSeek V4 answers that: they can, and they’re cheaper. The question is what happens when million-token context windows become routine, when Chinese labs ship three frontier models in four days, and when the cost of frontier AI drops by 86% in a single release.

For developers: V4-Flash is your new default. For agent builders: the CSA/HCA architecture is what makes long-horizon agents economically viable. For everyone else: the AI you can run yourself just caught up with the AI you rent.


📚 Sources

Sources: DeepSeek AI, Hugging Face Blog, @cryptopunk7213 on X