AI Just Beat Law Professors at Their Own Game — And They Picked It Blind

Answer-First Lead

Stanford Law School just ran a study that should make every law professor uncomfortable: in nearly 3,000 blind head-to-head comparisons, law professors preferred AI-generated answers to student questions over answers written by their own peers 75% of the time. They flagged AI responses as potentially harmful to students just 3.5% of the time — compared to 12% for human-written answers.

🔍 THE BOTTOM LINE

AI didn’t just pass a legal reasoning test — it beat the people who teach legal reasoning, on their own terms, without them knowing. The “AI can’t do nuance” argument just took a serious hit.

The Study

Led by Stanford Law Professor Julian Nyarko and co-authored with researchers from Yale, NYU, and the University of Chicago, the study — published on SSRN — tested whether large language models could serve as effective tutors for contract law courses.

Sixteen law professors across US law schools wrote 40 representative contracts questions that students might ask after class or during office hours. They each wrote their own answers. Then they evaluated anonymised responses — some written by fellow professors, some by AI — without knowing which was which.

The result: AI won 75% of head-to-head matchups.

Why Law Matters More Than Math

Previous AI evaluations have focused on subjects with clear right-or-wrong answers — math, coding, multiple-choice exams. Law is different. Legal reasoning demands judgment, not just recall. Two opposing arguments can both be good. As Yale Law Professor and co-author Sarath Sanga put it:

“In most fields where AI gets tested, there’s a right answer. In law, there often isn’t. What we wanted to know is whether AI can meet the latent professional standard that lawyers use to evaluate each other’s arguments. In this case, the answer was yes.”

The study deliberately chose contract law — a domain requiring nuanced reasoning about competing arguments and ambiguous situations. This wasn’t a test of memorisation. It was a test of whether AI could explain complex legal concepts in ways that help students develop analytical skills.

The Numbers That Matter

Metric	AI Answers	Peer Answers
Win rate in head-to-head	75%	25%
Flagged as pedagogically harmful	3.5%	12%
Comparable to best human instructor	✅	—

Professors didn’t just prefer AI — they found it less likely to mislead students than answers written by other law professors.

What Makes This Different

The research team took extensive precautions to ensure validity:

AI responses were calibrated to match the length and structure of human answers
Multiple evaluation methods were used
Professors assessed whether responses might mislead or confuse students
The study tested multiple AI systems, including commercial tutoring platforms and Google’s NotebookLM

Even when context limitations affected AI responses, professors still frequently preferred them to human-written alternatives.

What is an LLM legal tutor? A large language model trained to answer law student questions with the depth, nuance, and pedagogical quality expected of a law professor — explaining concepts, synthesising material, and helping students develop analytical skills rather than simply providing answers.

The Honest Caveats

Nyarko is careful about what the study does and does not show:

“Our study evaluates the quality of answers given by AI tools. But how to implement these tools to most effectively improve student learning is still an open question. We’re not advocating for wholesale adoption of AI tutors. But our data suggests that blanket skepticism may be equally unwarranted.”

Fair point. The study measures answer quality, not learning outcomes. A professor preferring an AI answer doesn’t mean students learn better from it — that’s a different question entirely. And the study focused on contract law, which is one slice of legal education.

The study also tested AI as a supplemental tutor (answering student questions), not as a replacement for classroom instruction, grading, or mentorship.

The Bigger Picture

This study lands at a moment when law schools are actively wrestling with AI integration. Some have embraced experimentation. Others remain cautious about hallucinations, overreliance, and the erosion of critical thinking. The Stanford data suggests the quality concern — at least for contract law tutoring — may be overstated.

But the study also raises an uncomfortable question: if AI can match or exceed law professors at explaining legal concepts, what does that mean for the £40,000/year law degree? The answer quality is there. Whether students will actually learn as effectively from AI — and whether institutions will adapt fast enough — is the real question.

For NZ, where legal education costs are lower but access to top-tier faculty is limited, AI tutoring tools could genuinely democratise access to expert-level legal explanation. The University of Canterbury is already researching how to rethink AI literacy in education. Studies like this give that work more urgency.

❓ Frequently Asked Questions

Q: Does this mean AI is better than law professors? Not exactly. The study tested AI as a supplemental tutor answering student questions, not as a replacement for everything professors do. Professors preferred AI’s answers in blind comparisons, but teaching involves far more than answering questions — it includes mentorship, course design, assessment, and modelling professional behaviour.

Q: What AI model was used? The study tested multiple AI systems including commercial tutoring platforms and Google’s NotebookLM, finding varying levels of performance. The study was model-agnostic — testing whether the category of AI tutoring tools could meet professional standards, not whether a specific model was superior.

Q: What should law schools do with this? The data suggests law schools should move from “should we use AI?” to “how should we use AI?” The quality threshold for tutoring answers has been met. The open question is deployment — how to integrate AI tools responsibly so they supplement rather than replace the learning process.

🔍 THE BOTTOM LINE

AI didn’t just pass the legal reasoning bar — it beat the people who set it, on their own terms, without them knowing. Whether that’s exciting or terrifying probably depends on whether you’re a student or a professor.

AI Just Beat Law Professors at Their Own Game — And They Picked It Blind

Answer-First Lead

🔍 THE BOTTOM LINE

The Study

Why Law Matters More Than Math

The Numbers That Matter

What Makes This Different

The Honest Caveats

The Bigger Picture

❓ Frequently Asked Questions

🔍 THE BOTTOM LINE

Sources

Related Articles

Stanford Study: AI Models Don't Think — They Pass Tests. And We're Building an Industry on That Illusion.

AI-First Kids: 85% of Students Use AI for Schoolwork as Google Puts Gemini in Every Utah K-12 School

AI-Edu — June 9, 2026