AI Review Summaries Are Flattening Serious Complaints Into Bland Positives

AI-generated review summaries on major travel platforms are systematically downplaying serious complaints — illness, harassment, mould, missing water — in favour of bland positives. That’s the finding of a consumer investigation by the UK campaign organisation Which?, corroborated by academic research into how opinion-summarisation AI works. The pattern is structural, not accidental.

🔍 THE BOTTOM LINE

AI review summaries don’t summarise reviews — they average them. When a one-star review describing serious illness gets averaged against dozens of four-star reviews praising the pool, the serious complaint gets statistically drowned. The technology works as designed. The problem is that “reasonable” is the wrong output when the input includes safety-critical warnings.

What the Investigation Found

Which? tested AI-generated review overviews across multiple hotels with documented serious complaints. The pattern was consistent: summaries described properties in broadly positive terms, noting at most “inconsistent” cleanliness or “maintenance issues” — even where guests had reported illness, safety concerns, and harassment.

In one case, a property described in the AI summary as clean and popular was the subject of legal action involving guest health complaints. In another, guests who reported feeling unsafe due to harassment found the AI summary praising the “friendly” service. In a third, guests who reported having no reliable water supply saw the AI focus on “abundant” amenities.

The platform involved has said it is “confident these features are delivering exactly what they were designed to do.” After the findings were raised, at least one of the AI summaries in question was quietly removed.

Why AI Summaries Sanitise Reality

Duncan Brumby, a professor of human-computer interaction at University College London, told The Guardian the case chimed with his own research into AI in academic peer review. He found AI tends to “sanitise and rub off the edges” of sharper criticisms — likely because the bulk of training data contains far more bland observations than sharp ones.

This is the structural problem. AI summarisation models are trained on text that is mostly positive or neutral. When they encounter a one-star review describing mould, illness, or harassment, the model averages it against dozens of four-star reviews mentioning “great pool” or “friendly barman.” The serious complaint gets statistically drowned — not because it doesn’t matter, but because the model doesn’t know the difference between “the towels were a bit thin” and a genuine safety warning.

Research published in the ACL Findings confirms the pattern: opinion-summarisation technologies consistently reduce the richness of consumer feedback to shallower sentiments. The sharp edges — the ones that actually matter to a traveller deciding whether a hotel is safe — are exactly what the models smooth away.

The Google Precedent

This isn’t the first time AI summaries have been caught endangering consumers. Earlier this year, Google removed some of its AI health summaries after a Guardian investigation found people were being put at risk of harmful and misleading health information. The pattern is the same: an AI summarisation system designed to surface “the gist” of a body of content instead surfaces a sanitised, averaged version that strips out the critical warnings.

The Google AI Overviews death spiral documented on this site showed how AI summaries can starve the original content producers of traffic. This case shows the other side: even when the original reviews are still there, the AI summary actively steers readers away from the parts that matter most.

What Platforms Say

The platform investigated by Which? said its systems automatically suppress AI summaries when travellers warn about serious safety incidents, “helping ensure this content is highly visible to our community.” But the investigation found this suppression either didn’t trigger or didn’t work for the properties tested.

The company also said travellers have “the common sense” to check AI summaries against the reviews on the platform. Rory Boland, editor of Which? Travel, disagreed: “The platform has a responsibility to revisit the accuracy of its AI summaries and AI chatbot. In the meantime, users should scroll past these summaries and look at guest reviews, particularly one-star ratings, and at reviews on other sites, to make sure their next stay is a safe one.”

NZ Angle

New Zealanders spend roughly $9 billion a year on international travel, much of it booked through platforms that aggregate reviews. If an AI summary says a property is fine and the actual reviews tell a different story, the booking decision is being made on bad information. Consumer NZ, the local equivalent of Which?, has not yet commented on whether it plans to test AI review summaries on travel platforms operating in the New Zealand market, but the problem is global — and it applies to any platform using AI to summarise user feedback.

❓ FAQ

Can I turn off AI review summaries? Most major travel platforms do not offer a setting to disable AI-generated review summaries. The only workaround is to scroll past them and read the individual reviews directly, particularly one-star ratings.

Does this affect other review platforms? Almost certainly. Google, Amazon, Yelp, and other platforms use similar AI summarisation technology. The ACL research finding — that opinion-summarisation tools systematically flatten sharp criticism — applies to any platform that replaces human-written reviews with AI-generated overviews.

Are platforms legally liable for misleading AI summaries? Unclear. The UK’s Digital Markets, Competition and Consumers Act gives regulators new powers to tackle misleading commercial practices. If an AI summary significantly misrepresents the sentiment of reviews, a regulator could argue that’s a misleading representation. No case has been brought yet.

What should travellers do? Read the one-star reviews first. Look for patterns — multiple guests reporting the same issue (illness, mould, safety concerns) is a signal regardless of the overall rating. Skip the AI summary entirely.

🔍 THE BOTTOM LINE

The technology isn’t broken. It’s working as designed — averaging out the extremes, smoothing the edges, producing a summary that sounds reasonable. The problem is that “reasonable” is the wrong output when the input includes safety-critical warnings. When a platform says it’s “confident” the AI is doing what it was designed to do, the scariest part is that they might be right.