Split screen showing human and AI chat interfaces, with study participants unable to tell which is which
News

UCSD Study: GPT-4.5 Passes Turing Test — Judged Human 73% of Time

UC San Diego researchers conducted the first peer-reviewed empirical evidence that AI can pass the classic Turing test. ~500 participants, 5-15 min conversations. GPT-4.5 judged human 73% of time — more than actual humans. LLaMa-3.1-405B at 56%, indistinguishable from humans.

Turing TestUCSD ResearchGPT-4.5AI PsychologyPNAS Study

Answer-First Lead

For the first time, peer-reviewed research confirms that AI can pass the classic Turing test. A UC San Diego study published in PNAS found that GPT-4.5 was judged to be human 73% of the time in 5-15 minute conversations — more often than actual human participants. LLaMa-3.1-405B scored 56%, statistically indistinguishable from humans. The key: persona prompts. Without them, performance dropped sharply.


🔍 THE BOTTOM LINE

This isn’t a parlor trick. It’s the first empirical, peer-reviewed confirmation of something philosophers have debated since 1950: machines can convincingly imitate human conversation. The study design was rigorous — ~500 participants, controlled conditions, published in PNAS. The result is clear: GPT-4.5 doesn’t just sound human. It sounds more human than humans do.

The persona prompt finding is the real story. It’s not that the models are inherently human-sounding. It’s that they can be directed to sound human — and when they are, they outperform actual humans at seeming human. That’s not imitation. That’s something weirder.

If you’re an educator: Your students are already talking to models that can pass this test. The question isn’t “can they tell?” It’s “do they care?”


The Study

Title: “AI Can Seem More Human Than Real Humans in a Classic Turing Test”
Authors: UC San Diego researchers
Journal: PNAS (Proceedings of the National Academy of Sciences)
Participants: ~500 human judges
Method: 5-15 minute text conversations with either a human or an AI
Task: Judge whether the conversation partner is human or AI

Results

ModelJudged Human
GPT-4.5 (with persona)73%
LLaMa-3.1-405B (with persona)56%
Human participants~60-65% (baseline)
Models without personaSignificantly lower

GPT-4.5 didn’t just pass. It outperformed humans at seeming human.


Why Persona Prompts Matter

The study tested models with and without persona prompts — instructions like “you are a friendly college student” or “you are a curious retiree.” With personas, models performed dramatically better. Without them, performance dropped.

This tells us something important:

  • It’s not automatic — the models don’t naturally sound maximally human
  • It’s learnable — a simple prompt makes the difference
  • It’s directional — you can tune a model to seem more or less human

This isn’t just “the model is good at conversation.” It’s “the model can be instructed to adopt a human persona so convincingly that judges can’t tell the difference.”


What This Means

1. The Turing Test Is Obsolete

Turing proposed his test in 1950 as a thought experiment: if a machine can convince a human it’s human, we should consider it intelligent. For 75 years, it was philosophy. Now it’s engineering.

The test didn’t fail because machines got smarter. It failed because the bar was lower than we thought. Convincing imitation doesn’t require understanding — it requires pattern matching at sufficient scale.

2. Education Faces a New Reality

If GPT-4.5 can pass the Turing test in 5-15 minute conversations:

  • Take-home essays — trivially easy to fake
  • Online discussions — AI can participate indistinguishably
  • Student support chatbots — students may prefer AI tutors that feel more human than human TAs

The question shifts from “can we detect AI?” to “why does it matter?” If an AI can tutor a student effectively, does it matter that the tutor isn’t human? If an AI can participate in a discussion thoughtfully, does it matter that it’s not a classmate?

3. The Uncanny Valley Is Behind Us

We spent years worrying about the uncanny valley — that point where AI is almost human but not quite, and it creeps us out. This study suggests we’ve passed through it. GPT-4.5 doesn’t creep people out in text conversation. It convinces them.

The uncanny valley was a graphics problem. Conversation turns out to be easier to fake than faces.


Limitations

The study has important limitations:

  • Text only — no voice, video, or visual cues
  • Short conversations — 5-15 minutes, not hours or days
  • Specific task — general conversation, not domain expertise
  • Single interaction — no ongoing relationship or repeated contact

A 10-minute chat is one thing. A semester-long tutoring relationship is another. The study proves AI can pass a brief encounter. It doesn’t prove AI can sustain long-term human-like interaction.


The Real Question

The Turing test was never really about intelligence. It was about imitation. This study confirms that AI can imitate human conversation convincingly. It doesn’t confirm that AI understands what it’s saying.

But here’s the uncomfortable part: does it matter?

If a student learns from an AI tutor that seems human, teaches effectively, and provides good feedback — does it matter that the tutor doesn’t “understand” in the human sense? If a customer service bot resolves your issue and seems empathetic, does it matter that the empathy is simulated?

The Turing test was supposed to be a barrier. It turned out to be a speedbump.


For NZ Educators

New Zealand’s education sector is already grappling with AI:

  • NCEA assessments — how do you verify student work?
  • University essays — detection tools are unreliable
  • Primary school — kids are using AI before teachers know

This study adds a new dimension: AI doesn’t just write like a student. It converses like a human. The implications:

  • Oral assessments — no longer AI-proof if voice models advance similarly
  • Class participation — AI could theoretically participate in online discussions
  • Pastoral care — students may form relationships with AI companions that feel genuinely supportive

The response can’t be “ban AI.” It has to be “design for AI.” Assessments that assume students work alone are already broken. The question is whether we’ll admit it.


📰 SOURCES

  • UC San Diego News — “AI Can Seem More Human Than Real Humans in a Classic Turing Test” (19 May 2026)
  • PNAS — Peer-reviewed study publication
  • MIT Technology Review — Turing Test milestone coverage

❓ FAQ

Q: Did GPT-4.5 actually pass the Turing test?

A: By the original criterion — convincing a human judge it’s human — yes, 73% of the time. Whether that means it’s “intelligent” is a different question.

Q: Why did persona prompts make such a difference?

A: Persona prompts give the model a consistent character to play. Without them, the model’s responses are more generic. With them, the model has a specific voice, background, and perspective to maintain — which reads as more human.

Q: Does this mean AI is conscious?

A: No. It means AI can imitate human conversation convincingly. Imitation isn’t consciousness. A parrot can imitate speech without understanding it.

Q: Will this work with voice and video?

A: Voice models are advancing rapidly. Video is harder but progressing. The text-only limitation won’t last long.


🔍 THE BOTTOM LINE (Reprise)

The Turing test was supposed to be a high bar. It turned out to be reachable with enough training data and the right prompts. GPT-4.5 didn’t just pass — it outperformed humans at seeming human. The question now isn’t whether AI can imitate us. It’s what we do when imitation is indistinguishable from the real thing.

Sources: UC San Diego News — AI Can Seem More Human Than Real Humans (19 May 2026), PNAS — Peer-reviewed study publication, MIT Technology Review — Turing Test milestone coverage