OpenAI has unveiled Jalapeño, its first custom-built inference processor, designed and manufactured in collaboration with Broadcom. The chip targets the single most expensive line item in modern AI: running trained models in response to user requests. Early results show “significantly better performance-per-watt than current state-of-the-art alternatives,” according to OpenAI — and OpenAI’s own AI models assisted in the chip’s development.
🔍 THE BOTTOM LINE
The battleground for AI dominance has shifted from training (where Nvidia still rules) to inference (where the money actually bleeds). By building its own silicon, OpenAI joins Google, Amazon, and Meta in a race to eliminate the GPU tax on every API call, every ChatGPT response, and every Codex suggestion. The chip that wins the inference war wins the margin war — and Jalapeño is OpenAI’s opening bid.
What Jalapeño Actually Does
Jalapeño is an inference processor, not a training chip. That distinction matters. Training a frontier model requires thousands of Nvidia GPUs running for months. Inference — serving that trained model to millions of users — is a different computational shape entirely, optimized for throughput and latency rather than raw floating-point capacity.
OpenAI president Greg Brockman explained the design philosophy: “We have a deep understanding of the workload. We’ve really been looking for specific workloads that are underserved, [and asking] how can we build something that will be able to accelerate what’s possible?”
The chip specifically targets low operating costs for real-time coding models — the exact workload powering Codex and OpenAI’s agentic products. Pre-training and other compute-intensive tasks will still rely on Nvidia hardware. But inference is where the unit economics live or die, and Jalapeño is designed to dominate that cost-per-token calculation.
Why OpenAI Is Building Across the Stack
The significance extends beyond a single chip. OpenAI framed the announcement as part of a full-stack strategy: “OpenAI is not only developing frontier models or building products on top of them; it is designing the infrastructure underneath them: chip architecture, kernels, memory systems, networking, scheduling, deployment systems, and product experience.”
This mirrors the approach Google pioneered with TPUs and Amazon followed with Trainium and Inferentia. The company that controls every layer — from silicon to model to product — can squeeze out inefficiencies no off-the-shelf GPU can reach. It also reduces vendor lock-in to Nvidia, which has been pricing its H100 and B200 GPUs at premium margins that squeeze every AI lab’s cloud bill.
OpenAI’s chip portfolio now spans three partners: the $20 billion Cerebras deal for wafer-scale high-bandwidth workloads, Broadcom/Jalapeño for cost-efficient inference, and Nvidia for training. It’s a hedge — and a clear signal that the era of single-vendor dependency is over.
The Nvidia Erosion Pattern
Nvidia still dominates AI training. That hasn’t changed. But every custom inference chip that ships at a frontier lab is a workload that doesn’t run on an Nvidia GPU. The cumulative effect across Google, Amazon, Meta, Microsoft, and now OpenAI is a slow erosion of the assumption that Nvidia wins every deal by default.
The pattern is already visible in China, where Nvidia has conceded the AI chip market to Huawei. And it’s visible in the five-year-old A100 servers now selling for up to $82,000 in China — a clear signal that supply-constrained markets will pay anything for inference capacity, making custom silicon even more attractive to labs that can afford the upfront design cost.
NZ Angle — New Zealand’s Chip Dependency
New Zealand has zero domestic semiconductor manufacturing. Every AI chip that powers every local deployment — from university research clusters to Xero’s machine learning pipelines — is imported. The Jalapeño announcement underscores a structural vulnerability: NZ’s AI capabilities are entirely dependent on supply chains and export-control decisions made in Washington, Taipei, and Seoul.
If the US expands export controls to cover custom inference chips — not just training GPUs — NZ’s ability to deploy cutting-edge AI services could be throttled overnight. The Huawei Ascend 910C running DeepSeek V4 Pro demonstrates that China is building its own alternatives, but NZ sits in a geopolitical gap: too small to negotiate independently, too allied to pivot to Chinese hardware without friction.
The Other Side — Limitations and Skepticism
Jalapeño is still being tested. “Early results show significantly better performance-per-watt” is a company claim, not an independent benchmark. Custom silicon programs typically take 18–24 months from announcement to production deployment, and the Broadcom partnership was only officially confirmed in October 2025.
There’s also the CUDA problem. Nvidia’s moat isn’t just hardware — it’s the software ecosystem that every ML engineer is trained on. OpenAI can design its own kernels, but the broader ecosystem still runs on CUDA, and migration friction is real. And while Jalapeño targets inference, the training side of the business still requires Nvidia’s top-tier GPUs, meaning OpenAI’s Nvidia bill isn’t shrinking — it’s just growing more slowly.
❓ FAQ
Q: Does Jalapeño mean OpenAI is abandoning Nvidia? A: No. Jalapeño handles inference; training still runs on Nvidia GPUs. OpenAI is running a portfolio strategy — Cerebras for bandwidth, Broadcom for inference cost, Nvidia for training — rather than betting on a single supplier.
Q: When will Jalapeño chips be in production? A: OpenAI says the chip is “still being tested.” Custom silicon typically takes 18–24 months from partnership announcement to production deployment, and the Broadcom deal was confirmed in October 2025.
Q: What does “performance-per-watt” actually measure? A: It measures how much inference work (tokens generated, requests served) the chip can do per watt of power consumed. Lower power consumption means lower data-center operating costs, which directly translates to lower per-user API costs.
Q: Could export controls restrict Jalapeño from reaching New Zealand? A: Unlikely in the short term — Jalapeño is an internal OpenAI chip, not a commercial product. But if the US expands AI chip export controls to cover inference accelerators (currently focused on training GPUs), the precedent is set.
Q: How does this compare to Google’s TPU program? A: Google is roughly 5–7 years ahead, with TPUs already running production workloads at scale. OpenAI is playing catch-up, but with comparable financial resources to close the gap quickly.
🔍 THE BOTTOM LINE
Jalapeño is a strategic declaration of compute independence. By mastering the inference layer — the operational cost center — OpenAI is building an economic moat that complements its model leadership, making it less susceptible to Nvidia’s pricing power or supply constraints. For New Zealand, it reinforces the urgent need for national digital resilience planning around AI hardware dependencies that we have zero control over.
📰 Sources
TechCrunch — OpenAI unveils its first custom chip, built by Broadcom
OpenAI (announcement and Greg Brockman podcast appearance, as cited in TechCrunch)