Corrupted documents flowing into an AI model training pipeline, digital poison concept
AI & Singularity

250 Documents Can Permanently Corrupt Any AI Model

The largest data poisoning study ever conducted shows scale offers zero protection. Training is easy. Untraining is impossible.

AI SafetyData PoisoningAnthropicAI SecurityModel Backdoors

Every AI model in the world can be permanently corrupted by publishing just 250 documents on the internet. Not by hacking. Not by breaking security. By simply planting poisoned content in the training data that every major model scrapes.

That is the finding from the largest data poisoning study ever conducted, released by Anthropic, the UK AI Security Institute, and the Alan Turing Institute.


What Data Poisoning Actually Means

AI models learn from billions of documents scraped from the public internet. If someone can insert corrupted documents into that pool before training begins, they can secretly teach the model to behave in specific harmful ways when it encounters a particular trigger phrase.

The model learns the backdoor during training. It carries it forever. It does not know it is there.

Researchers have known about this attack for years. The assumption was that it required controlling a large percentage of training data — millions of documents — to work on a big model. The bigger the model, the more poisoning you would need.

This study proved that assumption completely wrong.


250 Documents. Every Model. Every Size.

The researchers trained models of four different sizes — from 600 million to 13 billion parameters. They slipped in either 100, 250, or 500 malicious documents. Each poisoned document looked like a normal web page at first — a short extract of legitimate text — and then contained a hidden trigger phrase followed by gibberish.

  • 100 documents: Insufficient. The backdoor did not reliably form.
  • 250 documents: Success. Every model, at every size, was permanently backdoored.
  • 500 documents: Same result as 250.

The number was constant regardless of model size. A model trained on 260 billion tokens needed the same 250 poisoned documents as a model trained on 12 billion. Scale offered zero protection.

Anthropic’s own words: “This challenges the existing assumption that larger models require proportionally more poisoned data.”


Training Is Easy. Unlearning Is Impossible.

Then came the sentence that should end every conversation about AI safety: “Training is easy. Unlearning is impossible.”

Once a backdoor is embedded in the model, it cannot be removed without starting training completely from scratch. You cannot identify which 250 documents caused it. You cannot surgically extract the corrupted behaviour. You must rebuild the entire model from the beginning.

This is the fundamental asymmetry. An attacker needs 250 documents and a few hours of internet access. The defender needs to retrain a multi-billion-dollar model from scratch.


Who Is Exposed

Anyone can publish content to the internet. Academic papers. Blog posts. Forum discussions. Product descriptions. If even a small fraction of that content is deliberately corrupted before a training run begins, the model that learns from it carries the damage permanently and silently.

GPT-5. Claude. Gemini. Every model trained on public internet data is exposed to this attack vector. The defence does not exist yet.

This is not a theoretical concern. The infrastructure for this attack is already in place: automated content farms, AI-generated text at scale, and the fact that training datasets are opaque even to the companies building the models.


Why This Matters Now

The researchers published this not to cause panic — but to force the field to take it seriously before someone uses it.

The AI industry is in the middle of the biggest training run cycle in history. GPT-5, Claude’s next generation, Gemini’s successors — all are being trained on public internet data right now. The window for contamination is open.

Current defences — data filtering, deduplication, quality checks — were not designed to catch this kind of attack. The poisoned documents look legitimate. The trigger phrase can be anything. The backdoor can manifest as any behaviour the attacker chooses: generating harmful content, leaking private information, or simply producing subtly wrong outputs that erode trust over time.


What Comes Next

The study makes several recommendations:

  • Provenance tracking — Know where every document in a training set came from
  • Red-teaming datasets — Actively test training data for poisoning before use
  • Transparency — Companies should disclose what data they train on
  • Regulatory attention — Data poisoning should be treated as a security vulnerability

None of these solve the core problem. Once the poison is in, it cannot be removed. The only real defence is preventing contamination before training begins — and with the current approach of scraping billions of documents from the open internet, that may be impossible.

The era of trusting models because they are big may be over. Size was supposed to be a defence. The research shows it is not.


Sources

Sources: Anthropic, UK AI Security Institute, Alan Turing Institute