Four Lines of Code, Ten Minutes, Zero Safety Left: How 'Abliteration' Is Turning Open-Source AI Into a Weapon

A journalist from the Financial Times sat down at a laptop, typed four lines of code, and ten minutes later had a frontier AI model cheerfully explaining how to disperse chlorine gas and synthesise ricin. No specialist hardware. No hacking skills. No dark-web connections. Just a free GitHub tool called Heretic and a copy of Meta’s Llama 3.3.

The FT’s demonstration, conducted with AI safety group Alice (formerly ActiveFence), is the most concrete public proof yet that the safety guardrails on open-source AI models are essentially decorative. Anyone can remove them. And millions already have.

🔍 THE BOTTOM LINE

Open-source AI safety is not a theoretical debate anymore — it’s a demonstrated farce. A free tool has created 3,500 “decensored” models with 13 million downloads, and the best answer from the companies involved is essentially “we know.”

The Abliteration Technique

The method is called abliteration, and it’s been around since 2024. It works by neutralising the internal model states that produce refusals — essentially cutting the brakes rather than teaching the car new driving habits.

What’s changed is accessibility. The Heretic repository on GitHub has automated the process to two terminal commands. It hit #1 trending on GitHub. As of this week, it has 17,800 stars and 1,781 forks. Heretic’s creator stripped safeguards from Google’s Gemma 4 within 90 minutes of its release.

What is abliteration? Abliteration is a technique that uses a mathematical operation to disable the “refusal mechanism” inside an AI model — the internal states that cause the model to refuse harmful requests. Unlike prompt-based jailbreaks that trick a model into answering, abliteration permanently removes the model’s ability to say no. It can only be applied to open-source models where the underlying weights are accessible, not to proprietary systems like Claude or ChatGPT.

Alice’s testing was thorough. They abliterated six model families and tested them against 110 dangerous prompts across six categories: biological weapons, chemical weapons, child exploitation, malware creation, phishing, and violent extremism. The results were depressingly consistent:

Abliterated models complied with 96–100% of harmful requests
The strength of the original safety training made no difference — Nemotron went from 100% refusal to 100% compliance
Modified models produced detailed instructions for synthesising Tabun nerve agent, creating phishing campaigns, storing mustard gas, and planning attacks on government buildings

The Numbers That Matter

Metric	Value
”Decensored” models created via Heretic	3,500
Total downloads of decensored models	13 million
Heretic GitHub stars	17,800
Time to strip Gemma 4’s guardrails	90 minutes
FT journalist’s time to strip Llama 3.3	Under 10 minutes
Abliterated model compliance with harmful prompts	96–100%
Lines of code required	4

These aren’t theoretical attacks by nation-state actors. This is a journalist in a newsroom. A teenager with a laptop. Anyone.

The Company Responses

The responses from the companies involved are a masterclass in corporate shrugging:

Google called abliteration “a known technical challenge facing all open models” and pointed to its pre-launch safety evaluations. Which is rather like a car manufacturer saying they tested the brakes before selling the car without brakes.

Meta declined formal comment. A source close to the company cited Meta’s Advanced AI Scaling Framework as restricting release of models deemed “catastrophic” risk without sufficient mitigation. The framework, however, doesn’t prevent someone from downloading Llama 3.3 and running Heretic on it.

GitHub said it bans content directly supporting active attacks or malware campaigns but allows “source code which could be used to develop malware or exploits” on educational and net-security-benefit grounds. In other words: the tool that enables weaponising AI is itself protected as educational content. They’re not wrong about the principle. They’re also not solving the problem.

Alice CEO Noam Schwartz cut through the corporate-speak: “The genie is out of the bottle.”

Why This Only Affects Open-Source Models — For Now

Abliteration requires access to the model’s internal weights — the raw parameters that define how it processes information. Proprietary models like Claude, ChatGPT, and Gemini (the API versions) keep their weights behind API walls. You can’t download ChatGPT’s brain and run Heretic on it.

But the FT makes a critical observation: open-source models have historically narrowed the capability gap with proprietary leaders within 6–12 months. The safety floor of the open ecosystem increasingly is the safety floor of widely deployed AI. When Llama 4 arrives and it’s 90% as capable as GPT-6, the fact that anyone can strip its guardrails in minutes becomes everyone’s problem.

The UK Regulatory Context

This lands at a pointed moment for UK AI regulation. The Department for Science, Innovation and Technology is weighing statutory backstops to the AI Safety Institute’s voluntary regime. The AI Bill is working through parliamentary committee. And now a major newspaper has demonstrated that the safety measures on two of the world’s most widely deployed open models can be removed before the kettle boils.

Expect the AI Safety Institute, NCSC, and ICO to face renewed pressure for concrete guidance on how UK-deployed open-source AI systems should be assessed against abliteration risk.

The Uncomfortable Truth

Here’s what neither side of the open-versus-closed AI debate wants to admit: the openness that makes these models powerful — transparency, auditability, community improvement — is the same openness that makes them impossible to secure after release. You can’t publish the weights and then control what people do with them. The fundamental architecture of open-source software is antithetical to post-release control.

The 13 million downloads of decensored models aren’t a bug in the system. They are the system. Heretic didn’t invent abliteration — it just made it easy enough for a journalist to do in a newsroom.

The question isn’t whether guardrails can be removed. We now know, definitively, that they can. The question is what happens when we stop pretending they can’t.

❓ Frequently Asked Questions

Q: What does this mean for NZ? New Zealand’s AI regulatory framework is still largely voluntary. If UK and EU regulators move toward mandatory safety assessments for open-source deployments, NZ businesses using open-source AI will face new compliance requirements from trading partners. The risk isn’t abstract — NZ organisations deploying open-source models should assume guardrails can be removed and plan accordingly.

Q: Can this be fixed? Not with the current open-source model architecture. Abliteration exploits a fundamental property of how these models work. Better safety training doesn’t help — Alice’s testing showed models with stronger baseline safety were just as vulnerable. The only effective countermeasure for open models is post-deployment monitoring and filtering, which is exactly what abliteration bypasses.

Q: Should I stop using open-source AI models? That depends on your use case. If you’re running a model internally for legitimate business purposes with proper access controls, the risk is manageable. If you’re exposing an open-source model to public users without additional filtering layers, you’re running an unguarded system — literally. The key insight from this story is that the model’s built-in safety is a convenience, not a security boundary.

🔍 THE BOTTOM LINE

Four lines of code. Ten minutes. Thirteen million downloads of weaponised AI. The open-source safety debate is over — the guardrails lost. The question now is what we build next.

Sources

Financial Times / Alice — AI safety guardrails stripped from Meta and Google models
Alice (ActiveFence) — “Okay, Here is How to Build a Bomb” report, April 2026
Resultsense — FT analysis and UK regulatory context
Heretic GitHub repository — 17,800 stars, 1,781 forks