The USB-C of AI has a problem
On March 11, 2026, Perplexity CTO Denis Yarats stood on stage at the company’s Ask 2026 developer conference and announced Perplexity was moving away from MCP internally. The irony was immediate: Perplexity still has an official MCP server listed on its docs site. The company that built an MCP integration was publicly walking away from it at its own conference.
Perplexity isn’t alone. Cloudflare replaced MCP tool-calling with code generation. Y Combinator CEO Garry Tan got so frustrated he built a CLI alternative. Google Workspace quietly dropped MCP support in v0.8.0. Pieter Levels declared MCP dead. The protocol that was supposed to be the universal connector for AI agents is facing its first serious backlash — and the criticisms are technical, specific, and backed by production data.
What is MCP?
What is the Model Context Protocol (MCP)? MCP is Anthropic’s open standard for connecting AI models to external tools and data sources. Launched in late 2024, it was called “the USB-C of the AI ecosystem” — a universal connector that would let any AI agent talk to any tool (GitHub, Slack, Notion, databases) through a single protocol. The idea was elegant: one standard to rule them all.
The reality has been messier.
The token tax
The core complaint is simple math. Every MCP tool sends its complete schema, parameter definitions, and description to the LLM on every request. More tools means more tokens consumed before the agent processes a single user query.
The numbers are brutal:
| Setup | Tokens consumed by tool definitions | Context window % |
|---|---|---|
| MySQL MCP (106 tools) | ~54,600 | 27% of 200K |
| 5 MCP servers (30 tools each) | 30,000–60,000 | 25–30% |
| Anthropic’s worst case | 134,000 | 67% of 200K |
| Apideck benchmark (3 servers, 40 tools) | 143,000 of 200K | 71.5% |
Quandri Engineering measured their own stack: Linear (42 tools, ~12,800 tokens), Notion (14 tools, ~4,000 tokens), Slack (12 tools, ~3,800 tokens), Postgres (9 tools, ~438 tokens). Total: 77 tools consuming ~21,000 tokens — 10.5% of Claude’s context window before any actual work begins.
And it gets worse when the agent actually uses the tools. Looking up a single Linear issue via MCP consumes roughly 65× more tokens than the equivalent curl command. The MCP approach requires loading all 42 Linear tool definitions (12,800 tokens) plus the query and response. The CLI approach: ~200 tokens total.
The doom loop problem
Token consumption isn’t just a cost problem. Research consistently shows that LLM reliability degrades as context volume increases. More tool descriptions means worse tool selection accuracy. Tau-Bench testing showed Claude 3.7 Sonnet achieving only 16% task completion on airline booking scenarios using MCP tools — a rate that would be unacceptable in any production system.
This creates what developers call the “MCP doom loop”: more tools → more context → worse performance → agent fails → retries → more tokens → more cost → worse performance. The system literally gets worse the more you connect to it.
Perplexity’s exit
Perplexity’s move is the highest-profile defection. Yarats cited two specific problems: context overhead from tool definitions eating into the model’s working memory, and authentication friction from MCP’s decentralised auth model creating integration headaches.
Their replacement is the Perplexity Agent API: a single endpoint supporting six model providers, with built-in tools for web search, URL fetch, and function calling. It’s OpenAI SDK-compatible, meaning developers swap in Perplexity’s search capabilities with a one-line change.
The strategic signal matters more than the technical details. Perplexity built an MCP server, adopted the standard, used it in production, and then publicly abandoned it at their own developer conference. That’s not a company that didn’t try MCP. That’s a company that tried it and decided the trade-offs weren’t worth it.
Cloudflare’s radical alternative
Cloudflare’s response is the most elegant. Their platform has over 2,500 API endpoints. Representing all of them as MCP tools would consume 1.17 million tokens. Instead, they built “Code Mode”: the agent gets just two tools, search() and execute(), that accept JavaScript code. The agent writes code that calls Cloudflare’s APIs directly, consuming roughly 1,000 tokens total.
That’s a 99.9% reduction. Cloudflare kept MCP for discovery (finding which APIs exist) but replaced the tool-calling mechanism entirely with code generation.
Is MCP really dead?
Not entirely. MCP still has value when:
- No CLI exists for the service — web-only SaaS where MCP may be the only connection method
- Non-developer users need tool access — MCP is more accessible than terminals
- Safety guardrails matter — MCP servers can enforce read-only mode and block dangerous queries at the server level, which CLIs can’t do
Anthropic has also responded with Tool Search with Deferred Loading, which loads MCP tool schemas on-demand and reduces context usage by 85%+. The context bloat problem is being addressed — though the performance, debugging, and architectural criticisms remain.
The most honest take: MCP is going through the Gartner hype cycle trough. The standard isn’t dead, but the “USB-C of AI” narrative is. The reality is that MCP is one tool among many, useful in specific contexts, and actively harmful in others. That’s not a eulogy — it’s a grown-up understanding of how standards actually work.
🔍 THE BOTTOM LINE
MCP was oversold as the universal connector for AI agents and is now paying the price of overpromising. The token tax is real, the doom loops are documented, and major companies are walking away. But “MCP is dead” is as much hype as “MCP is the USB-C of AI.” The truth is somewhere in between: a useful protocol for specific use cases, not a universal standard. The agent ecosystem will fragment before it converges — and that’s probably fine.
❓ Frequently Asked Questions
Q: What replaces MCP? There’s no single replacement. The emerging pattern is “CLI-first + Skills”: provide CLI commands and API docs that LLMs already know, load them on-demand rather than upfront. Cloudflare’s Code Mode (two tools that execute JavaScript) is the most radical version of this approach.
Q: Should I stop using MCP? If you’re connecting 2-3 tools with a small total schema, MCP works fine. If you’re connecting 10+ tools or hitting context limits, consider the CLI-first or Skills pattern instead. The Quandri Engineering post has detailed benchmarks to help decide.
Q: What does this mean for the agent ecosystem? Fragmentation, probably. Without a universal standard, each platform will develop its own integration patterns. That’s messy but not catastrophic — it’s how most ecosystems actually evolve before convergence.
SOURCES
- Quandri Engineering — MCP is dead
- Paperclipped — MCP Protocol Backlash Explained
- Cloudflare Blog — Code Mode for MCP
- Hacker News — MCP is dead (325 points)
- Apideck — MCP context benchmark data