A computer screen showing terminal hacking challenge with AI overlay, dark moody lighting
Technology & People

AI Didn't Just Break CTF Competitions — It Broke the Pipeline That Trains Security People

Frontier AI models can now auto-solve most CTF challenges. The competitive format that trained a generation of security professionals is turning into a pay-to-write token-burning contest.

AICybersecurityCTFCapture The FlagAI agents

Answer-First Lead

A veteran CTF competitor and former top-10 team member has declared the competitive Capture The Flag scene effectively dead. Claude Opus 4.5 one-shots medium-difficulty challenges. GPT-5.5 Pro can solve Insane-difficulty pwn challenges. The scoreboard now measures who can afford to burn the most tokens, not who has the sharpest security skills. And the collateral damage isn’t just a hobby — it’s the primary pipeline that trained an entire generation of cybersecurity professionals.

🔍 THE BOTTOM LINE

AI didn’t just break a competition format — it broke the ladder that turns curious beginners into skilled security practitioners. If the CTF scoreboard no longer reflects human skill, we’ve lost the best on-ramp we had into cybersecurity.


The Ladder That Built an Industry

Capture The Flag competitions have been the backbone of cybersecurity talent development for over a decade. The format is simple: solve security puzzles (cryptography, reverse engineering, web exploitation, binary pwn), capture the flag (a secret string), climb the scoreboard.

CTFs gave the field something it desperately needed — a meritocratic, measurable way to learn. Beginners could see themselves improve. Strong players could prove their skill. Employers used CTFTime rankings as recruiting signals. Top-tier teams like PPP, DEFKOR, and Dragon Sector became legends.

The pipeline was clear: play CTFs → get good → get noticed → get hired. It worked because the scoreboard meant something. You solved the challenge, you got the points, you earned your rank.

That pipeline is now broken.

What Changed: From Tool to Replacement

When GPT-4 arrived, it could handle some medium-difficulty challenges with a single prompt. That was concerning but manageable — hard challenges still required real skill, and the time savings weren’t dramatic enough to break competition dynamics.

Claude Opus 4.5 changed the equation entirely. According to a detailed post by veteran CTF competitor Kabir, almost every medium-difficulty challenge and some hard ones became agent-solvable. Claude Code made it trivial to build an orchestrator that used the CTFd API to spin up a Claude instance for every challenge, solving most of the board in the first hour while the human player was left copying flags rather than finding them.

Anthropic’s own red team CTF research confirmed this: Claude is competitive with humans in cybersecurity competitions. They didn’t sugarcoat it. The data speaks for itself.

Then GPT-5.5 Pro arrived and sealed it. By the author’s account, it can one-shot Insane-difficulty active leakless heap exploitation challenges on HackTheBox. In a 48-hour CTF, orchestrate Pro against the hardest challenges and there’s a good chance you capture the flag before time runs out.

CTFs are now pay-to-win. The more tokens you can throw at a competition, the faster you burn down the board. Performance no longer defines security skill. It defines budget and willingness to delegate to machines.

Why It Matters Beyond the Scoreboard

This isn’t just about hurt feelings among competitive nerds. The implications go deep:

The talent pipeline is damaged. CTFs were the primary way people got into security. That feedback loop — solve challenges, see yourself improve, climb the ladder — is now broken. Beginners are pushed toward using AI before they’ve built the instincts the AI replaces. That’s an anti-pattern. Active struggle is what teaches you, and the scoreboard no longer rewards it.

Recruiting signals are broken. Companies used CTFTime rankings to find talent. When those rankings reflect AI orchestration budget rather than human skill, the signal is noise. This is especially relevant in NZ, where the cybersecurity talent pool is small and every signal matters. We covered related AI impact on hiring in how AI resume bias is reshaping recruitment — but CTF collapse is a different kind of hiring disruption entirely.

Challenge authors are demoralised. The people who spent weeks crafting beautiful, educational challenges are watching them get eaten by agents in minutes. Why invest that effort when the competitive format no longer values the craft?

The “beginners can still learn” take misses the point. Yes, beginners can use CTFs for learning. They always could. But CTFs weren’t just puzzles — they were a ladder. The visible, meritocratic scoreboard was the motivation. Remove the meaning from the scoreboard, and you’ve removed the reason most people showed up in the first place.

The Chess Analogy (And Why It Doesn’t Help)

The obvious counter: chess engines dominated chess decades ago, and chess is bigger than ever. CTFs will just become like chess — the competitive format persists alongside machines that are better at it.

Kabir addresses this directly. Chess solved the problem by creating clear divisions: human-only tournaments, engine-only tournaments, and human+engine tournaments. CTFs have no such divisions. There’s no way to verify whether a team used AI. The open online format that most people play has no enforcement mechanism.

Chess also has a deeply entrenched institutional structure — FIDE, grandmaster titles, century-old traditions. CTFs are largely informal, community-run, and online. There’s no governing body to institute a “human-only” division.

What Comes Next

Several paths forward are possible:

  1. AI-separated divisions. CTFs could run in explicit tiers: human-only (proctored), AI-assisted, and AI-only. The hard part is enforcement — proving a team didn’t use AI in an online competition is essentially impossible without invasive monitoring.

  2. Shift to educational platforms. Beginners may be better served by picoGym, HackTheBox, and other learning-focused platforms where the point is skill development, not competition. The scoreboard problem goes away when the scoreboard isn’t the point.

  3. Challenge design evolution. CTFs could move toward challenges that are harder for AI — more multi-step, more requiring of human intuition, less amenable to prompt-and-solve. But this is an arms race, and frontier models are catching up fast.

  4. Accept the new reality. Perhaps CTFs become AI orchestration competitions, and that’s fine for what it is — but we should stop pretending the scoreboard reflects security skill the way it used to.

For New Zealand, this matters more than it might seem. Our cybersecurity workforce is small and heavily reliant on the CTF pipeline. DownUnderCTF, Australia’s largest CTF, has been a critical entry point for ANZ security talent. If that pipeline degrades, NZ organisations need alternative pathways — and fast.

❓ Frequently Asked Questions

Q: What does this mean for NZ’s cybersecurity workforce? NZ already faces a cybersecurity skills shortage. If CTFs stop being a reliable training and recruiting pipeline, NZ organisations need to invest in structured learning programs, apprenticeships, and security certifications as alternative pathways. DownUnderCTF remains valuable as a learning tool, but its competitive signalling value is degrading.

Q: Can’t we just ban AI in CTFs? Not realistically. Online CTFs have no reliable way to detect AI use. Proctored, in-person events could enforce it, but those are expensive and exclude most participants. The open online format that most people play is effectively unpoliceable.

Q: Should beginners still play CTFs? Yes — but with adjusted expectations. CTFs are still excellent learning environments. The problem is with the competitive scoreboard, not with the puzzles themselves. Use CTFs for learning, not for ranking.


🔍 THE BOTTOM LINE

The CTF scene isn’t dying because AI is bad — it’s dying because AI is too good at the thing that was supposed to separate skilled humans from unskilled ones. When the fastest path up the ladder is delegating to a machine, the ladder has stopped measuring what it was built to measure. The cybersecurity community needs to build new on-ramps — because the old one just got automated.


Sources

Sources: Kabir's Blog (kabir.au), Anthropic Red Team, CTFTime