A government building with an AI chatbot interface overlaid, representing GOV.UK Chat serving citizens
Technology & People

The UK Just Put an AI Chatbot in Front of Every Citizen — Here's What NZ Should Learn From It

The UK's GOV.UK Chat, powered by Claude, just went live for all app users with 90% accuracy after 26,000 test questions. Here's why it matters for NZ.

AI in GovernmentGOV.UK ChatAnthropic ClaudeNew ZealandDigital Government

Answer-First Lead

The UK government’s AI chatbot, GOV.UK Chat, is now live for all 563,000 users of the GOV.UK App — the first major Western government to put a consumer-facing AI assistant in front of its entire citizen base. Built on Anthropic’s Claude after switching from GPT, it achieved 90% accuracy across 26,000 test questions. The question for New Zealand isn’t whether to follow, but how fast.


🔍 THE BOTTOM LINE

The UK just proved that AI chatbots can work in government — 90% accuracy, 508 jailbreak attempts all blocked, 73% of users finding it useful. NZ’s digital government infrastructure could learn a lot from this playbook.


What Just Happened

After two and a half years of development, two public pilot phases involving 10,000 users and 26,000 questions, the UK government has fully integrated GOV.UK Chat into its citizen-facing app.

Technology Secretary Liz Kendall put it plainly: “For too long, navigating government has felt like a full-time job. Whether you’re a parent trying to find out what childcare you’re entitled to, a first-time buyer working out which schemes you can access, or someone approaching retirement, you shouldn’t have to spend time trawling through hundreds of web pages to get a straight answer. GOV.UK Chat changes that.”

The chatbot answers questions about government services — tax, benefits, driving, housing, retirement — drawing exclusively from 80,000 pages of official government guidance across GOV.UK’s 700,000 total pages.

The Numbers That Matter

MetricValue
GOV.UK App users563,000 registered
Pilot participants10,000+
Questions asked in pilots26,000
Answer accuracy90% (up from 76%)
User satisfaction64%
Users finding it useful73%
Jailbreak attempts blocked508 (all successful)
In-scope answer rate88%

Why Claude, Not GPT

Here’s a detail worth noting: GOV.UK Chat started life on the same technology as ChatGPT but switched to Anthropic’s Claude for the production deployment. The Government Digital Service (GDS) is now using Amazon’s Bedrock platform to host Claude models.

Why the switch? The GDS blog doesn’t spell it out, but the context is telling. Claude has built a reputation for being more controllable and less prone to hallucination in enterprise settings — exactly what you want when a government is putting words in citizens’ mouths about their benefits, tax obligations, and legal rights.

The system is designed to allow model upgrades as new versions become available. Smart architecture choice — you don’t want to be locked into one LLM provider when this space moves as fast as it does.

What Worked

Accuracy improvement: Going from 76% to 90% accuracy over the pilot period shows the team took iteration seriously. They used a combination of subject matter experts and automated evaluation tools, only rating an answer as “accurate” if it met all the standards of published content.

Safety: 508 jailbreak attempts — all blocked. When you’re a government chatbot, that’s the table stakes. You can’t have the thing telling people how to commit fraud or giving harmful advice.

Honest about limitations: The team explicitly tells users that GOV.UK Chat can make mistakes, provides accuracy warnings, and built features that make it easy to verify answers at the original source. That’s the right approach for government AI — trust but verify.

Clarifying questions: When users asked ambiguous questions, the system learned to ask clarifying questions rather than guessing. This pushed the in-scope answer rate to 88%.

What Didn’t Work

The “I want to speak to a human” problem: The GDS found some users wanted to speak to an adviser even when the chatbot had answered their question. Government call centres take around 100,000 calls per day — and DSIT estimates up to half could be handled by the chatbot. That’s the efficiency case, but the trust gap remains.

Out-of-scope questions: By design, GOV.UK Chat won’t answer questions outside government guidance. The team is working on improving how the system handles these, but there’s an inherent tension: citizens don’t organise their lives into “in scope” and “out of scope” categories.

The NZ Angle

New Zealand is in a fascinating position here. We have:

  • A relatively small population (5.2 million) that’s digitally savvy
  • A government that’s already invested in digital services through govt.nz
  • An AI adoption rate of 42% among NZ businesses (per AI Forum NZ’s latest report)
  • Call centre bottlenecks across multiple government agencies

The UK model is directly applicable. The key lessons:

  1. Start with pilots. The UK didn’t go straight to a full launch — they ran two public pilots with real users. NZ should do the same.

  2. Draw from official content only. GOV.UK Chat only answers from published government guidance. This is the right constraint for government AI — no training on random internet data, no hallucinated advice about your tax obligations.

  3. Be transparent about accuracy. 90% sounds great until you’re the person who got the wrong answer about your benefit entitlement. Government AI needs accuracy warnings and source links, full stop.

  4. Choose your model carefully. The UK’s move from GPT to Claude for production is instructive. For government use, controllability and reliability matter more than raw capability.

  5. Build for model swapping. The Bedrock architecture lets the UK switch models. NZ should do the same — no lock-in.

What This Means Globally

The UK is the first major Western government to put a consumer-facing AI chatbot in front of its entire citizen base. That’s a big deal. Other countries are watching closely:

  • Singapore has been a leader in digital government but hasn’t gone this far with generative AI for citizens
  • Estonia, the digital government pioneer, is exploring similar tools
  • Australia has been testing AI in government services but hasn’t launched a comparable citizen-facing tool

The UK’s 90% accuracy benchmark will become the standard that other governments are measured against. If NZ launched something similar and couldn’t hit that number, it would be a step backward.

The Bigger Question

The real question isn’t whether AI chatbots can handle government services. The UK just proved they can — with caveats. The real question is: what happens to the people who currently provide those services?

DSIT says GOV.UK Chat will “free up frontline staff to focus on complex cases where human support is more needed.” That’s the optimistic framing. The realistic framing is that government call centres — which employ thousands of people — will eventually need significantly fewer staff.

For NZ, where government employment is a significant part of the economy, that’s a conversation worth having now, not after the chatbot is live.

❓ Frequently Asked Questions

Q: Could NZ build something similar? Absolutely. NZ’s govtech stack is solid, our population is small enough for a manageable pilot, and we already have government content on gov.nz that could power a chatbot. The UK’s architecture (Bedrock + Claude) could be replicated here. The question is political will, not technical feasibility.

Q: Is 90% accuracy good enough for government? It depends on the stakes. For “how do I renew my passport?” — probably fine. For “am I eligible for this benefit?” — you need those accuracy warnings and source links. The UK approach of being upfront about limitations while continuously improving is the right model.

Q: Why did the UK switch from GPT to Claude? GDS hasn’t publicly explained the switch in detail. But Claude’s reputation for being more controllable and less prone to hallucination in enterprise contexts — plus Anthropic’s focus on safety — likely made it a better fit for a government deployment where getting things wrong has real consequences.


🔍 THE BOTTOM LINE

The UK just showed that government AI chatbots can work — not perfectly, but well enough. 90% accuracy, 508 blocked jailbreaks, 73% of users finding it useful. That’s a real result from a real deployment at real scale. NZ should be building the same thing, learning from the UK’s two and a half years of hard-won experience rather than starting from scratch. The playbook is right there — we just need to pick it up.


Sources

  • PublicTechnology — “Government AI chatbot goes live across GOV.UK App”
  • Inside GOV.UK Blog — “5 things we learned testing GOV.UK Chat”
  • GOV.UK — Algorithmic Transparency Record
  • DSIT press release
  • Civil Service World — “GOV.UK chatbot achieves 90% accuracy”
Sources: PublicTechnology, Inside GOV.UK Blog, GOV.UK, DSIT