Answer-First Lead
Alibaba’s Qwen3.7-Max ran a fully autonomous 35-hour kernel optimisation task on its own custom silicon, achieving a 10× speedup over reference implementations. The model made 1,158 tool calls, ran 432 kernel tests, and iterated on its own — with no human in the loop. This isn’t a demo. It’s a production workload.
🔍 THE BOTTOM LINE
When an AI model can run unsupervised for 35 hours writing low-level code that humans can’t easily read or verify, “can AI code?” is no longer the question. “Can you direct what AI builds?” is.
What Actually Happened
Alibaba’s latest model, Qwen3.7-Max, was given a task: optimise kernel-level code for Alibaba’s proprietary T-Head-ZW-M890 accelerator. This isn’t Python scripting or web development — this is the deepest layer of systems programming, where code talks directly to silicon.
The model ran for 35 continuous hours. During that time it:
- Made 1,158 tool calls (compiling, testing, profiling, iterating)
- Ran 432 kernel tests across multiple optimisation attempts
- Achieved a 10× speedup over the reference kernel implementation
- Required zero human intervention after the initial prompt
The result: working kernel-level optimisations for custom hardware — code that was never designed for human readability because it was targeting a chip that only exists inside Alibaba’s infrastructure.
The Benchmark Context
Qwen3.7-Max wasn’t the only model tested on this task. Alibaba ran a comparison:
| Model | Speedup over Reference |
|---|---|
| Qwen3.7-Max | 10× |
| GLM 5.1 | 7.3× |
| Kimi K2.6 | 5× |
| DeepSeek V4 Pro | 3.3× |
Qwen3.7-Max leads, but the more important signal is the cluster: multiple models can now autonomously optimise hardware-specific code. Qwen3.7-Max is the best at it today, but the capability itself is becoming table stakes for frontier models.
What is Kernel Optimisation?
Kernel optimisation is the process of rewriting the lowest-level code that runs directly on hardware — the code that sits between an operating system and the physical chip. It’s some of the most specialised programming that exists: hardware-specific, performance-critical, and traditionally requiring deep expertise that very few engineers have.
When an AI model does this autonomously, it’s not writing a web app or generating a script. It’s operating at a layer where most human engineers don’t go — and doing it faster than the specialists who do.
Why This Matters More Than It Sounds
This story is getting less coverage than the Mythos vulnerability findings or the GPT-5 Erdős proof, and that’s exactly why it matters more than either.
The Erdős proof is a milestone for pure mathematics — intellectually thrilling, but abstract. It doesn’t change how anyone builds anything tomorrow.
The Mythos findings expose a patching crisis — important, but fundamentally about an existing system failing to keep up.
The Qwen3.7 result is about a new system replacing a human role entirely. Kernel optimisation for custom silicon is a task that, until now, required rare specialist engineers. If an AI can do it autonomously for 35 hours, the bottleneck shifts from “can we find people who can do this?” to “can we verify what the AI built?”
That’s a different problem. And it’s one nobody has solved yet.
The Open-Source Question
There’s a wrinkle. Qwen3.7-Max is proprietary — Alibaba’s last open-weight flagship was Qwen3.5, released in February 2026. The model that ran this 35-hour autonomous task is closed-source, running on closed-source silicon, producing closed-source optimisations.
Alibaba was previously one of the most aggressive open-source AI publishers. The shift from open weights to proprietary releases mirrors a broader industry pattern: frontier capabilities stay closed, while slightly-older models get open-sourced for goodwill and ecosystem lock-in.
For the open-source community, this means the most impressive agentic AI demos are running on models they can’t inspect, modify, or deploy independently.
❓ Frequently Asked Questions
Q: Is the Qwen3.7-Max code available to inspect? A: No. The optimised kernel code is proprietary, running on Alibaba’s custom T-Head silicon. The model itself is closed-source. You can’t audit what it built or how it built it.
Q: Does this mean AI can now replace chip engineers? A: Not yet — but it can replace certain categories of chip engineering work. Kernel optimisation for known hardware targets is now automatable. Novel chip design, architecture decisions, and system-level trade-offs still need human judgment. The boundary is moving, though.
Q: What does this mean for NZ’s tech sector? A: NZ has limited custom silicon work, but the principle applies: if AI can autonomously optimise for hardware it was pointed at, it can also autonomously optimise cloud infrastructure, network configurations, and system deployments. The same capability that optimised Alibaba’s chips could optimise your AWS spend — and that’s a much wider market.
🔍 THE BOTTOM LINE
Thirty-five hours. Zero humans. Ten times faster. The question stopped being “can AI write low-level code?” somewhere around hour twelve. The question now is who verifies what the AI built — and whether we’re comfortable running infrastructure we can’t read.
Sources
- The Decoder