Three Chinese AI labs released frontier open-source models in four days. The last one — DeepSeek V4 — just matched GPT-5.5 on benchmarks at 86% less cost. The open-source AI race has a new leader, and it’s not in San Francisco.
🔍 THE BOTTOM LINE: DeepSeek V4-Pro matches or beats GPT-5.4 and Claude Opus 4.6 on coding benchmarks, supports a native 1-million-token context window, and is fully open source. DeepSeek V4-Flash — the smaller model — runs at 10% of the inference cost of V3.2. This isn’t a near-miss. This is parity.
🏆 The Numbers That Matter
DeepSeek V4 comes in two models:
| V4-Pro | V4-Flash | |
|---|---|---|
| Total params | 1.6 trillion | 284 billion |
| Active params | 49 billion | 13 billion |
| Context window | 1 million tokens | 1 million tokens |
| Best for | Frontier reasoning | Cost-effective serving |
| Codeforces rating | 3,206 (#23 among humans) | — |
| Open weights | ✅ | ✅ |
On Codeforces, V4-Pro’s 3,206 rating puts it at 23rd among human competitive programmers. That’s the first time an open model has matched a closed frontier model on competitive programming.
📊 How It Compares
The benchmark table tells the story:
- LiveCodeBench: V4-Pro-Max scores 93.5% — ahead of Claude Opus 4.6 (88.8%) and Gemini 3.1 Pro (91.7%)
- Codeforces: 3,206 rating, beating GPT-5.4’s 3,168
- SWE-Bench Verified: 80.6%, tied with Claude Opus 4.6 and Gemini 3.1 Pro
- MMLU-Pro: 87.5% — close but behind Gemini 3.1 Pro (91.0%)
Where V4 trails: general knowledge (Gemini 3.1 Pro still leads on MMLU-Pro, SimpleQA, GPQA Diamond), long-context retrieval (Claude Opus 4.6 retains the crown on MRCR at 1M tokens), and agentic coding (GPT-5.5’s 82.7% on Terminal Bench vs V4’s 67.9%).
The gap to proprietary frontier models has narrowed to months, not years.
🧠 The Architecture Breakthrough
V4’s headline feature isn’t just benchmarks — it’s the attention mechanism. Standard transformer attention scales quadratically with context length, which makes 1-million-token contexts prohibitively expensive. DeepSeek solved this with a hybrid approach:
- Compressed Sparse Attention (CSA): Compresses the KV cache by 4x, then uses a lightning indexer to retrieve the top-k most relevant entries for each query
- Heavily Compressed Attention (HCA): Aggressive 128x compression with dense attention over the compressed representation — a cheap global view of distant tokens
- FP4 quantization for MoE expert weights — halves memory vs FP8
The result at 1M tokens:
| FLOPs vs V3.2 | KV Cache vs V3.2 | |
|---|---|---|
| V4-Pro | 27% (3.7x lower) | 10% (9.5x smaller) |
| V4-Flash | 10% (9.8x lower) | 7% (13.7x smaller) |
A 10x smaller KV cache means roughly 10x more concurrent long-context sessions per GPU. This is what makes million-token agents economically viable.
💰 Pricing and Availability
DeepSeek hasn’t published final API pricing for V4 yet, but based on their history of undercutting OpenAI and Anthropic by 10-15x, expect:
- V4-Flash: Significantly under $1/million input tokens
- V4-Pro: Well under $5/million input tokens (vs GPT-5.5 at $5/$30 and Opus 4.7 at $5/$25)
Available now via:
- API:
deepseek-v4-proanddeepseek-v4-flash(OpenAI and Anthropic compatible) - Chat: chat.deepseek.com
- Open weights: Hugging Face (DeepSeek collection)
V3 endpoints (deepseek-chat, deepseek-reasoner) retire July 24, 2026.
🇨🇳 Three Labs, Four Days
The timing matters. This week:
- Monday: Moonshot drops Kimi K2.6
- Wednesday: Alibaba drops Qwen 3.6-27B
- Thursday: DeepSeek drops V4
Three Chinese labs, three frontier open-source models, in under four days. Combined with Huawei announcing Ascend 950 chip support for DeepSeek V4, China is building a self-contained AI stack: Chinese weights, Chinese chips, Chinese inference software.
⚠️ What’s Still Missing
DeepSeek is candid about limitations:
- Multimodal: V4 is text-only. No images, audio, or video
- Knowledge gap: Trails Gemini 3.1 Pro on knowledge-heavy benchmarks
- Long-context ceiling: Retrieval accuracy degrades above 128K tokens (66% at 1M)
- Agentic coding: GPT-5.5 and Opus 4.7 still lead on Terminal Bench and SWE Pro
- Architecture complexity: DeepSeek calls V4 “relatively complex” and plans to simplify future versions
🔍 THE BOTTOM LINE
The question isn’t whether open-source models can compete at the frontier anymore. DeepSeek V4 answers that: they can, and they’re cheaper. The question is what happens when million-token context windows become routine, when Chinese labs ship three frontier models in four days, and when the cost of frontier AI drops by 86% in a single release.
For developers: V4-Flash is your new default. For agent builders: the CSA/HCA architecture is what makes long-horizon agents economically viable. For everyone else: the AI you can run yourself just caught up with the AI you rent.