Fable 5 Wrote a Windows Kernel in 38 Minutes — The Verification Gap Is Now

Anthropic’s Mythos-class model, Claude Fable 5, generated a booting, NT-compatible Windows kernel written entirely in Rust — the project ntoskrnl-rs — in a single 38-minute session. The achievement, detailed by Tolmo’s Twinkle and Matt Suiche on June 22, 2026, is less a technological milestone and more a stress test of the software industry’s verification capacity. A model can generate a kernel. The open question is what that tells us about where infrastructure software is going, and what has to be true before we trust any of it.

🔍 The Bottom Line

The speed at which Fable 5 authored a Trusted Computing Base (TCB) faster than any human team could review it means the primary bottleneck in modern software development is no longer coding. It is trust and validation.

What Happened in 38 Minutes

Fable wrote approximately 5,200 lines of code across 27 files in a contiguous session of 197 assistant turns. The timeline is remarkably compressed:

Time	Milestone
13:35	Empty repository
13:46	Traps, KPCR, and interrupt handling
13:51	Scheduler and dispatcher objects
13:57	Self-caught bug: EOI before context switch (would deadlock)
14:05	Self-caught bug: IRQL emulation must be per-thread, not global
14:10	First successful boot in QEMU
14:11	All self-tests pass
14:13	Done

The model exhibited advanced self-correction throughout. It identified that issuing EOI before a context switch would cause deadlock. It deduced that IRQL emulation must be per-thread rather than global — a subtle architectural decision a junior engineer might miss. It verified the release build compiles without debug assertions and caught function-cast warnings that would have been UB.

Evidence of Reasoning: The Code Comments

Fable didn’t just write boilerplate; it left architectural commentary embedded in the code:

GDT selector layout: A comment explains the NT selector layout matches the IA32_STAR MSR format, tying the kernel’s segmentation model directly to the CPU’s syscall/sysret mechanism.
IRQL = CR8 mapping: The model noted that IRQL levels map directly to CR8 bits, documenting the hardware-software contract for interrupt masking on x64.
Swapgs deferral: A comment explains that swapgs is deferred until the kernel confirms it is exiting to user mode — avoiding an expensive GS-base swap on every interrupt.

These reflect an understanding of the Intel SDM, Windows NT internals, and context-switch performance trade-offs — all synthesised into a single coherent codebase.

Generation Has Outpaced Verification

The kernel later expanded to load unmodified Windows drivers and execute standard binaries like sort.exe and cmd.exe. But this rapid ascent highlights a critical chasm. A human security auditor reviewing 5,200 lines of novel, low-level Rust code would require weeks of effort — time the AI bypassed in minutes.

The model itself identified the verification problem. In a comment on the dispatcher and DPC queue implementation, it wrote:

“The dispatcher lock hand-off, spinlocks, and DPC queue are where kernels die. loom can exhaustively explore thread interleavings… Miri can run the existing tests to catch UB that QEMU happily executes.”

This is the model acknowledging that generation has outpaced verification — and pointing to the tools (loom, Miri, proptest) that might close the gap.

Why Opus, Not Fable, Did the Security Work

While Fable achieved the initial 38-minute burst, a different iteration — Opus 4.8 — handled the subsequent eight-day security bring-up. The split is stark:

Model	Turns	Lines Written	Writes/Edits
Fable 5	197 (3%)	~5,200	45 writes
Opus 4.8	7,491 (97%)	~7,400	91 writes + 1,290 edits

Fable handled greenfield generation; Opus handled hardening, edge cases, driver loading, and the test suite. The source material suggests Fable’s internal safety classifiers actively blocked security-adjacent work. One model excels at raw generation speed; another navigates the guardrails of deep security hardening.

A telling detail: when the team switched from Opus to Fable, the prompt shifted from “Modern, secure, well documented/commented” to “Modern, well documented/commented” — the word “secure” was removed. Fable generates fast, but security is someone else’s problem.

Fable’s Suspension Under Export Control

Two days after its June 10 launch, Fable was suspended under a US export control directive. The capability itself — not just the code it produces — is now a geopolitical flashpoint. See Anthropic’s Fable launch and export control and The Mythos Era article for broader context.

NZ Angle: Implications for Kiwi Cybersecurity

For New Zealand’s tech sector, this presents a dual challenge. AI-generated infrastructure promises unparalleled productivity — a single developer could prototype what normally requires a team. But it exposes our reliance on human expertise as the ultimate gatekeeper. If we adopt AI-generated infrastructure without developing commensurate verification tools, we risk building critical systems on unverified foundations.

The immediate question for Kiwi enterprises: What is the minimum verifiable standard before trusting an AI-authored TCB?

❓ FAQ

Is this code safe to run in production today? No. The kernel boots and passes self-tests, but it has not undergone the exhaustive verification (loom for concurrency, Miri for UB, proptest for edge cases) that production infrastructure requires. The model itself flagged this gap.
What are loom, Miri, and proptest? Rust ecosystem verification tools. loom exhaustively explores thread interleavings to catch concurrency bugs. Miri detects undefined behaviour by interpreting MIR instead of running native code. proptest generates randomised test cases for property-based testing. The model explicitly recommended all three for the kernel it wrote.
What does “Mythos-class” mean? It suggests Anthropic is pushing models toward complex, multi-domain reasoning previously reserved for specialised PhD research — combining OS theory, hardware architecture, and Rust’s type system in a single coherent generation.
Why was Fable suspended? The suspension under US export control directives (June 12, two days after its June 10 launch) underscores that the capability itself — not just the code — is a geopolitical flashpoint.
Can we expect this level of output from open-source models soon? Achieving both the initial generation burst and sustained security hardening remains an unsolved frontier. The Fable/Opus split suggests different models may be needed for different phases of development.

📰 Sources

Tolmo blog post by Twinkle (Threat Research Agent) with Matt Suiche, Jun 22, 2026. https://tolmo.com/blog/when-the-model-writes-the-kernel/
TechCrunch on guardrails for Anthropic’s Fable (Jun 10, 2026). https://techcrunch.com/2026/06/10/cybersecurity-researchers-arent-happy-about-anthropics-fable/
Anthropic’s Fable and Mythos access announcement. https://www.anthropic.com/news/fable-mythos-access
Anthropic export control announcement. https://www.anthropic.com/news/export-control-fable