Xiaomi's MiMo Code Just Beat Claude Code on Long-Horizon Coding Tasks — and It's Open Source

Xiaomi has released MiMo Code v0.1.0 as open source, claiming it outperforms Anthropic’s Claude Code on long-horizon coding tasks that span 200+ steps. In blind tests with 576 human developers, MiMo Code reportedly beat Claude Code on multi-file refactors and multi-step programming tasks without participants knowing which agent they were using.

The release is paired with MiMo V2.5, a multimodal model with a 1M token context window, available free for a limited time. The combination of an open-source coding agent and a free frontier-tier model undercuts the closed-source coding-agent pricing model that Anthropic, GitHub, and Cursor have spent the last 12 months building.

🔍 THE BOTTOM LINE This is the first open-source coding agent to credibly claim leadership on the dimension that matters most for production work — long-running, multi-step, multi-file tasks. The benchmark gap was the last justification for closed-source coding-agent pricing. That justification just got weaker.

What MiMo Code actually is

MiMo Code is a terminal-based AI coding agent — the same shape as Claude Code, Anthropic’s flagship developer product. The implementation forks the open-source AI agent OpenCode and adds two key innovations:

1. Multi-agent task evaluation. When executing a task, MiMo Code runs multiple AI agents in parallel to generate candidate task execution plans. A judge agent then picks the most promising candidate and proceeds with execution. This is a “tree-of-thought” pattern applied to task planning rather than just reasoning, and it has shown measurable improvements over single-agent execution in Xiaomi’s benchmarks.

2. Deterministic skill execution. AI agents like Claude Code can use Skills, which are sets of instructions in natural language written as SKILL.md files. But when LLMs interpret natural language instructions, results are inconsistent — the same SKILL.md can produce different outcomes on different runs. MiMo Code introduces a mechanism that generates JavaScript code from SKILL.md files, allowing for deterministic task execution. The natural-language spec becomes a compiled, reproducible behaviour.

The combination is what makes the benchmark claim credible. Long-horizon coding tasks fail not because the model is bad, but because the agent loses track of context, makes inconsistent interpretations, or gets stuck in loops. A multi-agent planner plus deterministic skill execution addresses both failure modes.

The benchmark results

Xiaomi’s published benchmarks compare four configurations on long-horizon coding tasks:

Configuration	Score
MiMo-V2.5-Pro with MiMo Code	Highest
MiMo-V2.5 with MiMo Code	High
MiMo-V2.5-Pro with Claude Code	Medium
Claude Sonnet 4.6 with Claude Code	Lower

The exact deltas are blurred in the public reporting, but the relative ranking is consistent: MiMo Code + MiMo V2.5-Pro beats Claude Code + Claude Sonnet 4.6 on multi-step coding tasks.

The 576-developer blind test adds a layer of human validation. The methodology was standard A/B: same task, two agents, no labels, developers pick the better result. MiMo Code won the majority.

What this means for the coding-agent market

The closed-source coding-agent market has three pricing tiers:

Free / freemium (Cursor, Copilot, Windsurf free tier)
Subscription ($20-100/month per developer)
API consumption (Claude Code, OpenAI Codex at $3-15/M input tokens)

MiMo Code disrupts the bottom two tiers directly. It’s free, runs on a free model for a limited time, and the open-source licence means it can be self-hosted against any model the developer chooses. For a solo developer or a small team, the question “should I pay $20/month for Cursor or Copilot?” now has a real “use MiMo Code” answer.

For enterprises, the disruption is more nuanced. Self-hosting an open-source coding agent means accepting the operational cost of running the model, the security risk of running third-party agent code, and the procurement complexity of supporting a tool that’s not on a vendor’s product roadmap. Those costs are real but bounded — and for cost-sensitive teams, the calculus is shifting.

Why this matters for New Zealand

The NZ tech sector is small enough that developer-tooling costs add up disproportionately. A Wellington or Auckland engineering team of 10 paying $200/month each for Claude Code + Cursor is spending $24,000/year on coding-agent subscriptions. MiMo Code is free. The total cost becomes compute (which can be local on existing GPU hardware, as we covered in DiffusionGemma) plus the operational overhead of running the agent.

For NZ’s AI Blueprint for Aotearoa to deliver real productivity gains, developer-tooling costs are the lowest-hanging fruit. The case for funding NZ open-source coding agent deployment — both as a productivity intervention and as sovereign digital infrastructure — is now significantly stronger.

What MiMo V2.5 brings

MiMo V2.5 is the multimodal model that powers MiMo Code:

1M token context window — enough for an entire small codebase
Multimodal input — text and image
Free for a limited time — no pricing details yet, but the open-source weights are coming
Optimised for long-horizon tasks — built to handle the planning overhead that 200+ step tasks require

The model itself isn’t the story — Gemini, Claude, and GPT all have 1M+ token context models. The story is the agent harness that knows how to use the context window for sustained, multi-file work.

What comes next

Xiaomi’s release puts pressure on every closed-source coding agent vendor to demonstrate value beyond raw model access. The differentiation will move up the stack:

Agent reliability — how often does the agent complete a 200-step task without human intervention?
Tool integration — how well does the agent connect to enterprise systems, IDEs, and CI/CD pipelines?
Security and auditability — what evidence trail does the agent leave?
Specialised capabilities — code review, test generation, refactoring, security analysis

Anthropic’s response will be interesting. The competitive moat for Claude Code has been the Claude model itself; if open-source models reach parity on long-horizon tasks, the moat shifts to the harness, the integrations, and the enterprise procurement story. Claude Code’s recent GitHub Actions supply-chain vulnerability disclosure — and the broader vibe coding security reckoning — show that the integration layer has its own costs.

The bigger question: if Xiaomi can ship a free, open-source coding agent that beats Claude Code, what’s stopping Mistral, Alibaba, or DeepSeek from shipping one next month that beats MiMo Code? The race is now genuinely open at the agent layer.

❓ FAQ

Is MiMo Code really better than Claude Code? On long-horizon tasks (200+ steps), the Xiaomi benchmarks say yes. On short-form coding assistance (single-file edits, autocomplete), Claude Code is probably still competitive. The blind test with 576 developers is the strongest signal we have.

Can I run it locally? Yes. MiMo Code is open source. The MiMo V2.5 model is being made available for free for a limited time. Local deployment requires GPU resources comparable to other 7-9B parameter models.

What about Claude Code’s enterprise features? Claude Code’s enterprise tier includes SSO, audit logging, and GitHub Actions integration. MiMo Code doesn’t have those out of the box, but the open-source ecosystem will likely catch up. The benchmark gap on raw long-horizon tasks is what the closed-source vendors have to close first.

Will this hurt Anthropic’s revenue? Probably not directly — Claude Code’s enterprise tier is the real revenue, and most enterprise customers aren’t going to switch on benchmark claims alone. But the narrative shifts: open-source coding agents are no longer “almost as good.” They’re now “better in some scenarios, free, and you control the data.”

Is this another DeepSeek moment? Not quite. DeepSeek R1 was a model release that shocked the US frontier labs because the open-source weights matched closed-source capability at a fraction of the training cost. MiMo Code is an agent release that uses a competitive open-source model. The pattern is similar but the surprise is smaller — open-source coding agents were already catching up.

🔍 THE BOTTOM LINE

Xiaomi’s MiMo Code release is the first open-source coding agent to credibly claim long-horizon leadership over Claude Code. The benchmark gap was the last justification for closed-source coding-agent pricing in the consumer and prosumer tier. The justification is gone.

Xiaomi's MiMo Code Just Beat Claude Code on Long-Horizon Coding Tasks — and It's Open Source

What MiMo Code actually is

The benchmark results

What this means for the coding-agent market

Why this matters for New Zealand

What MiMo V2.5 brings

What comes next

❓ FAQ

🔍 THE BOTTOM LINE

📰 Sources

Related Articles

AI-Edu — June 12, 2026

Anthropic Deleted 80% of Claude Code's System Prompt for Claude 5 and Nothing Broke

Claude Code Sends 33,000 Tokens Before You Type a Word — OpenCode Sends 7,000