A laptop screen showing AI-generated code with a developer reviewing it in the background, warm office lighting, documentary photography style
Technology & People

Airbnb Says AI Writes 60% of Its New Code — And Microsoft Research Says 25% of It Gets Corrupted

Airbnb says AI writes 60% of its new code. Microsoft Research says LLMs corrupt 25% of documents in long editing sessions. Both are true. Here's what that means for developers.

AirbnbAI CodingClaude CodeAI DevelopersSoftware Engineering

Airbnb CEO Brian Chesky dropped a number on the company’s Q1 2026 earnings call that would have sounded like science fiction two years ago: 60% of Airbnb’s new code was written by AI. Not tested by AI. Not reviewed by AI. Written by AI.

The same week, Microsoft Research published a paper showing that frontier LLMs corrupt 25% of documents during long editing workflows. Both things are true. And that contradiction is the most important story in software engineering right now.

🔍 THE BOTTOM LINE

AI is writing most of the new code at one of the world’s biggest tech companies. It’s also quietly wrecking a quarter of the documents it touches. The future isn’t AI replacing developers — it’s developers learning to work with tools that are incredibly powerful and dangerously unreliable at the same time.


60% — The Number That Changed Everything

Airbnb isn’t the first company to boast about AI-written code. Google says over 30% of its new code is AI-generated. Microsoft has claimed similar numbers. Spotify said in February that its best developers hadn’t written a line of code since December, thanks to AI.

But Airbnb’s 60% is a step function. And Chesky wasn’t shy about what it means for headcount:

“Where you might have needed a team of 20 engineers before, an engineer can now spin up agents to do a lot of work under supervision.”

Let me translate that from CEO-speak: we don’t need 20 engineers for this anymore. One engineer with AI agents can do the work. The other 19? Their roles are being rethought, not eliminated — but the direction is clear.

Airbnb’s specific use case is telling. The company uses AI most heavily for building tools for its API partners — the property management software companies that hosts use to manage listings. These are integration tools, boilerplate-heavy, well-scoped. The kind of code that AI is genuinely good at: connecting APIs, translating data formats, generating CRUD interfaces.

Chesky also noted that Airbnb’s AI customer support bot now handles 40% of issues without human escalation, up from 33% earlier in 2026. The company is AI-first in everything from code to customer service.

The DELEGATE-52 Reality Check

Here’s where it gets uncomfortable. The same week Airbnb was celebrating AI-written code, Microsoft Research released DELEGATE-52 — a benchmark that tests what happens when you let LLMs edit documents over long workflows.

The findings are sobering:

  • Frontier models corrupt 25% of document content during long editing sessions
  • This includes Gemini 3.1 Pro, Claude 4.6 Opus, and GPT 5.4 — the best models available
  • Other models fail even more severely
  • Agentic tool use doesn’t help — giving the AI tools makes it worse, not better
  • Corruption gets worse with larger documents, longer interactions, and distractor files

The paper’s key insight: the errors are sparse but severe. The AI doesn’t make lots of small mistakes. It makes catastrophic ones — silently deleting important content, introducing factual errors, or breaking formatting — and then compounds them over subsequent interactions.

What is DELEGATE-52? It’s a benchmark from Microsoft Research that simulates long delegated workflows where an AI edits documents across 52 professional domains — from coding to crystallography to music notation. It measures whether the AI faithfully completes the task or silently corrupts the document. The answer, overwhelmingly, is corruption.

This isn’t a theoretical problem. If 60% of Airbnb’s new code is AI-written, and LLMs corrupt 25% of what they touch in long workflows, the math is brutal. Even if code editing is less error-prone than general document editing (which is plausible — code has tests), the compounding risk is real.

Why Both Numbers Are True

The trick is that Airbnb’s 60% and Microsoft’s 25% aren’t contradictory. They’re describing different parts of the same pipeline:

  1. AI is great at generating initial code. Boilerplate, APIs, data transformations, test scaffolding — these are well-understood patterns that LLMs can produce fast and mostly correctly.

  2. AI is terrible at maintaining code over time. The DELEGATE-52 findings show that the more you delegate iterative editing, the more drift and corruption accumulates. Each interaction has a small chance of introducing a silent error, and those errors compound.

This maps to what developers are already experiencing. AI generates code quickly. But reviewing, debugging, and maintaining AI-generated code takes significant human effort. The productivity gain is real — but it’s front-loaded. You save time writing. You still need to spend time verifying.

The companies winning with AI coding are the ones treating it as a force multiplier for experienced engineers, not a replacement for them. Airbnb’s “one engineer with AI agents” framing is accurate — but that one engineer needs to be very, very good at catching what the AI gets wrong.

What This Means for NZ Developers

New Zealand’s tech sector has always punched above its weight with small, skilled teams. AI coding tools amplify that advantage — a five-person team in Wellington can now produce what used to require 15 people.

But there’s a catch. The DELEGATE-52 research shows that experience matters more, not less, in an AI coding world. Junior developers who rely on AI to generate code they don’t fully understand are building on sand. The senior developer who can spot the 25% corruption — that’s who companies need.

For NZ’s growing number of AI-focused startups and the traditional tech companies pivoting to AI tools: invest in code review culture. The more code AI writes, the more important human review becomes. Not less. More.

This is also relevant to the 93,000+ tech layoffs in 2026 — the restructuring is real. Companies like Airbnb, Cloudflare, and Meta are explicitly replacing headcount with AI. But the roles being cut aren’t random. They’re the ones where AI can demonstrably do the work at acceptable quality. The safety net is depth of expertise.

The Honest Take

Here’s what nobody wants to say out loud: 60% AI-written code is impressive and terrifying at the same time. It’s impressive because it shows how fast AI coding has matured. It’s terrifying because it means most companies are building critical infrastructure on code that no human fully understands.

Airbnb’s Chesky gets this. His caution about chatbots for travel — “too much text, no direct manipulation, poor comparison, single-player” — shows someone who understands AI’s current limitations. The 60% number isn’t 100% because some code requires the judgment, context, and verification that only humans provide.

The companies that will thrive aren’t the ones that maximise AI code generation. They’re the ones that build the best systems for catching what AI gets wrong. Verification, testing, code review, and experience — these are the premium skills now. Not faster typing. Not better prompting. The ability to look at AI-generated output and say, with confidence, “this is wrong, and here’s why.”

That’s the skill NZ developers should be building. Not how to write prompts, but how to audit AI output. The 25% corruption rate is a feature, not a bug — it means the humans who can catch it are worth more than ever.


❓ Frequently Asked Questions

Q: Does 60% AI-written code mean 60% fewer developers? No. It means the same number of developers produce more output — or fewer developers produce the same output. Companies like Airbnb are choosing the second option, but many companies are using AI to accelerate without cutting. The net effect depends on the company.

Q: What is DELEGATE-52 and why should I care? DELEGATE-52 is a Microsoft Research benchmark that measures how well LLMs handle long delegated editing tasks. The answer: not well. Frontier models corrupt 25% of documents. If you’re letting AI write your code (or anything else) without careful review, you’re accepting that 1 in 4 interactions may introduce silent errors.

Q: What should NZ tech companies do about AI coding? Adopt it — but invest heavily in review. AI coding tools like Claude Code, GitHub Copilot, and Cursor are force multipliers. But they’re multipliers for good engineers, not replacements for them. Prioritise code review, testing culture, and developer experience. The premium skill is judgment, not speed.

Q: Is Airbnb’s 60% number trustworthy? It’s self-reported and covers Q1 2026 only. “AI-written” likely includes AI-suggested code that developers accepted, not just fully autonomous generation. Google’s 30% and Microsoft’s numbers use similar methodology. The trend is real even if the exact percentage varies.


🔍 THE BOTTOM LINE

Airbnb says AI writes 60% of its new code. Microsoft Research says AI corrupts 25% of what it touches. Both are true. The future of software engineering is humans and AI working together — and the humans who can catch the 25% are worth more than ever.


📰 SOURCES

Sources: TechCrunch, Microsoft Research, arXiv