A split composition: one half shows a glowing circuit board, the other half shows an empty developer desk with a coffee cup
News

AI Coding Agents Are About to Cost More Than the Developers They Replace

AI coding agents burn 10-100x more tokens than chat-based coding. A single complex task can cost $5-20. At that rate, the agent costs more than the developer.

AI coding agentsOpenAICodexAI economicstoken costs

AI coding agents are about to cost more than the developers they replace. Token consumption for agentic workflows runs 10–100× higher than chat-based coding, and a single complex task can burn $5–20 in API costs. Multiplied across a sprint, the agent is now the most expensive line item on the project — not the human.

🔍 THE BOTTOM LINE

Agentic AI coding has crossed a price threshold where the tool costs more than the labour it was meant to replace. This isn’t a transient inefficiency to be optimised away — it’s structural to how agents work. Better routing, smarter models, and longer context windows all raise the cost ceiling, not lower it.

The Math Doesn’t Work

At the heart of agentic coding is a simple economic problem: each task requires many LLM calls, not one.

A chat interaction might burn 1,000 tokens. An agent doing the same logical work — reading files, exploring a codebase, planning changes, executing edits, running tests, debugging failures — consumes 50,000 to 500,000 tokens for the equivalent outcome. At current API rates, that’s $0.15 to $1.50 for chat-based help versus $5 to $20 for an agent doing the same job. Enterprise deployments regularly report $50+ for a single complex refactor.

Stack that against a junior developer in New Zealand. Entry-level software engineer salaries sit at $55,000–$70,000 NZD — roughly $26–34 per hour fully loaded. An agent burning $20 per task, doing 50 tasks across a sprint, costs $1,000. The human still has to verify the output, write the requirements, and fix the failures. The promise was always that agents would handle the boring work while humans did the thinking. The data says agents are doing neither cheaply.

As our earlier coverage of GitHub showed, agent adoption is genuinely accelerating — but cost is now the brake nobody is talking about.

Why Agents Burn Tokens

Three structural factors compound:

  1. Multiple calls per task. Agents don’t answer questions; they act. Read a file (call 1), interpret it (call 2), plan the change (call 3), execute it (call 4), run tests (call 5), read failures (call 6), fix and retry (call 7+). A “single task” is rarely a single API request — it’s often a dozen.

  2. Context window bloat. Modern agents ship hundreds of thousands of tokens of context per call — codebases, prior attempts, error messages, system prompts. Longer context means more tokens billed per call. Anthropic, OpenAI, and Google have all raised prices on long-context tiers over the past year, not lowered them.

  3. Failure loops. When an agent gets stuck — and they do — it retries. A debugging session that takes 15 attempts isn’t 1× the cost, it’s 15×. Industry analysts call this the “token wall”: budget caps that force teams to abandon agent runs mid-task because the spend has already exceeded the value of the work.

This connects directly to what we covered in The Tokenpocalypse: Corporate AI Spending Hits a Wall as Trivial Tasks Drain Budgets — corporate AI budgets ballooning faster than any offsetting productivity gain.

The Router Illusion

The industry’s answer is routing: pick the cheapest model for each step, cache aggressively, compress context before each call. The Register’s coverage of nx.dev’s Polygraph meta-harness and Claude/Codex/Cursor routing optimisations suggests these can cut costs 30–60%.

Even at that saving, the math doesn’t change. An agent that costs $20 today drops to $8–14 with routing. That’s still 2–3× the cost of the human. Routing optimisation is fighting the cost curve — not inverting it.

OpenAI’s research paper “The Shift to Agentic AI: Evidence from Codex” acknowledges this dynamic but frames it as a productivity story, not a cost story. Their data shows agents complete tasks faster. They don’t show the bill for those completions. GitHub’s enterprise reports reveal the same gap: productivity wins claimed, but the invoice keeps growing in the footnotes.

NZ Angle

New Zealand’s tech sector runs on margins Silicon Valley forgot existed. A typical Kiwi SaaS company clears 60–70% gross margin; their US counterparts hit 80%. Local consultancies charge $120–180/hour for senior developers — the price point agentic AI was supposed to disrupt.

It hasn’t, and the economics haven’t shifted for NZ SMEs because they were never the target market for bleeding-edge agents. What has shifted is the cost of evaluating the technology. A CTO trying to assess whether Codex or Claude Code works for their team now needs to budget $500–2,000 in API spend per pilot — money most NZ startups can’t spend without board approval.

There’s also a less-discussed risk: OpenAI documented a Codex bug that destroyed a developer’s drive mid-task. NZ businesses with thinner technical depth don’t have the redundancy to absorb that kind of incident. When the agent breaks production at 2am, you need a human who can fix it — and you needed that human on payroll anyway.

❓ FAQ

Are agentic AI coding tools getting cheaper over time? API prices per token are falling. Total cost per task is not, because agents consume more tokens per task over time. Efficiency gains are real, but they’re being absorbed by expanded use — the classic Jevons paradox applied to software.

Won’t better models fix the cost problem? Better models are more expensive per token, not less. GPT-5-class models cost 5–10× GPT-4-mini. When agents route to them for hard tasks, per-task cost climbs even as the underlying call cost falls.

Is this just a transitional problem while the technology matures? The structural factors — multiple calls, long context, failure loops — are inherent to how agentic systems work. They can be optimised but not eliminated. The cost floor is much higher than the industry priced in during 2023–2024.

What about local models — running Llama at home? Local models cut the per-token bill to zero but add hardware, electricity, and maintenance costs. For teams already paying developers, those costs often exceed the API alternative — especially for the long-context tasks agents require.

Should NZ businesses adopt agentic coding tools now? Carefully, with explicit cost ceilings per task. Treat the agent like a contractor with a rate card, not a free assistant. Measure cost-per-completed-task, not just completion rate.

🔍 THE BOTTOM LINE

The agentic coding industry is selling productivity without pricing in its cost. Until vendors publish cost-per-completed-task alongside their productivity headlines, buyers are flying blind. New Zealand businesses — thinner margins, smaller teams, less redundancy — should pilot narrowly, measure ruthlessly, and ignore the press releases. The promise that AI would make software cheaper was always aspirational. The data says it hasn’t, and on the current trajectory, it won’t.

📰 Sources

Sources: The Register, OpenAI, GitHub, nx.dev