AMD Just Put a 235-Billion-Parameter Model on a $2,000 Lunchbox. The Mac Studio Won't Be Here for Months.

On Saturday evening, New Zealand time, AMD chief executive Lisa Su walked on stage and held up a lunchbox-sized PC. Inside the matte-black case was a single AMD Ryzen AI Max+ 395 chip and 128GB of unified memory. On the screen behind her, a 235-billion-parameter Qwen3 model was generating text, locally, with no cloud subscription, no API key, and no data leaving the box. The clip, posted by @starmexxx on Saturday at 6:36 PM NZT, has already pulled 36,700 views, 218 likes, and 257 bookmarks. The demonstration is the strongest signal yet that the “local AI” question has changed answers — not in 2027, not after the next Mac Pro, but right now, in a $2,000 box that ships from Shenzhen.

It is also a quiet indictment of two companies New Zealand developers have been waiting on. The Mac Studio M5 Ultra with enough memory to run a frontier model locally has been pushed to October 2026. The 512GB M3 Ultra Mac Studio has been quietly killed entirely. The base Mac mini is listed as “Currently Unavailable” on Apple’s online store. Tim Cook, on Apple’s April 30 earnings call, warned that Mac mini and Mac Studio supply “may take several months to reach supply-demand balance.” NVIDIA, meanwhile, has cut GeForce RTX 50-series production by 30–40% in the first half of 2026 to redirect the high-bandwidth memory (HBM) it needs to AI hyperscalers. SK Hynix holds roughly two-thirds of NVIDIA’s 2026 HBM4 allocation for the Vera Rubin platform. The consumer GPU is no longer the priority. AMD, using a different memory architecture entirely, has slipped through the gap.

🔍 THE BOTTOM LINE

The Mac Studio you wanted is six to nine months away and will cost NZ$8,000–NZ$12,000 when (or if) it arrives. The NVIDIA RTX 5090 you were saving for is rationed. Meanwhile, a $2,000 Chinese mini PC with an AMD chip you have never heard of is sitting in a Shenzhen warehouse right now, ready to run a 235-billion-parameter model on your desk this week. The story for New Zealand developers and small studios is not whether to wait for Apple or NVIDIA. It is whether to import the AMD box now and accept that “local” runs at 4 to 24 tokens per second depending on the model size — which is, for most working uses, fast enough.

What Lisa Su Actually Showed

The chip at the centre of the demonstration is the AMD Ryzen AI Max+ 395, codenamed Strix Halo. It was released in April 2025, well before the HBM crunch became news. Sixteen Zen 5 CPU cores, thirty-two threads, a 5.1 GHz turbo clock on TSMC’s 4nm process. The integrated GPU is the Radeon 8060S, 40 RDNA 3.5 compute units running at up to 2.9 GHz. The 50-TOPS XDNA 2 NPU handles lightweight on-device tasks. None of that is the story.

The story is the memory. The 395 supports between 64GB and 128GB of LPDDR5X-8000 — soldered to the package, shared between the CPU and the GPU as a single unified address space. That is the same trick Apple pulled with the M-series “unified memory architecture,” and the same reason Apple Silicon has dominated local AI inference. AMD has now done it on x86. The trade-off, in plain terms: LPDDR5X is much slower than HBM. An HBM-equipped NVIDIA card feeds a model at terabytes per second. The 395’s LPDDR5X feeds it at roughly 250–300 gigabytes per second. That is why the 235B model runs but slowly. It is also why the 395 is available at all. LPDDR5X is made by the same three vendors (SK Hynix, Samsung, Micron) that make HBM, but the LPDDR5X production line is not the bottleneck. The HBM line is.

The other trade-off: the memory is soldered. You cannot add more later. The 64GB and 128GB configurations ship from the factory and that is what you get. The Geeky Gadgets review of the EVO-X2 notes that the 240W power brick is a giveaway — this is a desktop replacement, not a portable. The case runs warm under sustained AI load, with measured temperatures of 85–98°C in Performance mode. It is not silent. It is also not expensive.

The Real Numbers

AMD’s headline claim is that the 395 “beat an NVIDIA RTX 5080 by more than 3x on DeepSeek R1 inference.” The RTX 5080 is NVIDIA’s $1,000 mid-range consumer card. If the 3x figure holds up in third-party testing, it would be a remarkable result for a chip that also includes a 16-core CPU. As of publication, the third-party verification is incomplete — Nish Tahir’s October 2025 benchmarks on the EVO-X2 (an earlier, less-optimised software stack) show a more sober picture.

What Tahir actually measured, on a 128GB EVO-X2 running Ubuntu 24.04 with the ROCm driver stack:

Model	Quantisation	Tokens / second
gpt-oss:20b	4-bit	23.80
gpt-oss:120b	4-bit	14.77
qwen3:32b	4-bit	4.42

The 20-billion-parameter class is genuinely usable for coding assistance, document drafting, and chat — it is faster than a human reads. The 120-billion-parameter class works but requires patience; 14.77 tok/s is roughly two-and-a-half times slower than ChatGPT Pro in casual use. The 32B Qwen3 at 4.42 tok/s is functional, not fluent — you can read its output, but you would not want to live in it. None of those models is the 235B that Lisa Su showed on stage. The 235B demo, in fairness to AMD, was probably running on a model in low-bit quantisation with a long context window; it was a “this is possible” demonstration, not a daily-driver experience.

The honest read: the AMD box is the best sub-$3,000 local AI machine you can buy in June 2026, by a wide margin. The 395’s 20B sweet spot puts it within striking distance of GPT-3.5 quality for most tasks, with no per-token cost and no data leaving the machine. The 120B performance is competitive with cloud APIs on a per-token basis, even if it is slower. The 235B capability is real but should be treated as future-proofing, not as the day-to-day use case.

The “Replaces $440/Month” Claim

The X post that broke the story makes a specific math argument. Claude Code Max at $200, ChatGPT Pro at $200, Cursor at $20, Gemini at $20 — that is $440 per month, $5,280 per year. A $2,000 box pays for itself in five to six months. The math is correct, and the math is also the wrong frame.

Three things the math does not include. First, Claude Code Max and ChatGPT Pro ship with frontier models in the 200B–500B class — larger and more capable than the 235B Qwen3 demonstrated on the EVO-X2. A subscription gets you a top-tier model and the orchestration around it. A box gets you a small model that runs at usable speed and a frontier model that runs at 4 tok/s. Second, the cloud tools have polished user experiences: chat history, agent loops, tool calling, file context, web search. The local stack has Ollama and a CLI. That gap is closing fast, but it is not closed. Third, the box is one machine. Your phone, your work laptop, and your partner’s iPad all still need cloud subscriptions for the foreseeable future. The $2,000 box does not retire your AI budget; it supplements it.

The other way to read the math: at any subscription level above $80/month, the box breaks even inside two years on raw inference cost alone, even if you keep the cloud tools for the things the local box cannot do. For a one-person studio, a small agency, or a developer who already pays for two of those four services, the case is real. For everyone else, the box is a sidecar, not a replacement.

Why the Mac Studio and the RTX 5080 Aren’t Coming

The local-AI hardware story for the first half of 2026 has been a story of absence. Apple’s M5 Ultra Mac Studio, the natural heir to the M3 Ultra as the consumer-friendly local-AI machine, was rumoured for WWDC 2026 in June. Bloomberg’s Mark Gurman reported on April 19 that the launch is now expected around October 2026, and that the next MacBook Pro with the M6 chip has been pushed to 2027. The reason, per Cult of Mac, is the same DRAM and NAND flash shortage squeezing the entire PC and smartphone industry. Apple is in a better position than its competitors, but it is not immune. The company has stopped selling the 512GB M3 Ultra Mac Studio and stopped accepting orders for some high-memory Mac mini and Mac Studio configurations. The base Mac mini was out of stock at Apple’s online store as of late April.

NVIDIA’s situation is structurally worse. The HBM crisis is not a blip — it is a multi-year reallocation of semiconductor manufacturing capacity from consumer electronics toward AI data centre infrastructure. Each H100 accelerator needs 80GB of HBM3; each Blackwell B200 needs 192GB of HBM3e. Total HBM demand has grown five-fold between 2023 and 2026. SK Hynix, Samsung, and Micron are investing $50 billion combined in new fabs, but new fabs take 18–24 months to come online. In the meantime, NVIDIA has cut consumer GPU production to feed the AI hyperscalers (Microsoft, Google, Amazon, Meta, plus the sovereign-AI programmes now in the G7’s sights). The GeForce RTX 5080 that a local-AI builder would have bought this spring is rationed, and the 5090 is a unicorn.

AMD, almost by accident, has landed in the right place. The Ryzen AI Max+ 395 is not the fastest AI chip you can buy. It is the fastest AI chip you can buy today, at a price a working developer can pay, with the memory architecture that actually matters for running modern open-weight models. The 395 was designed in 2024, well before the HBM shortage became a story, but it has inherited the consumer/local-AI market by default. That is the irony: AMD’s best local-AI product is not a strategic move into the consumer AI market. It is a 2024 chip that the 2026 supply chain happened to leave standing.

What This Means for New Zealand

For a New Zealand buyer, the practical reality of importing a GMKTec EVO-X2 with 128GB runs like this. The unit costs roughly US$2,000–US$2,300 (NZ$3,300–NZ$3,800 at current rates) on AliExpress or Amazon. Standard AliExpress shipping to New Zealand runs seven to fourteen days, depending on the warehouse (most ship from Shenzhen). GST of 15% applies on import over NZ$1,000, plus a customs clearance fee. All-in, expect to pay NZ$3,800 to NZ$4,500 landed for a 128GB unit. The 64GB version lands closer to NZ$2,800.

There is no local distributor. PB Tech, Computer Lounge, and the usual Mac specialists do not stock GMKTec, Nimo, BOSGAME, or Minisforum products. The warranty is one year, return-to-seller. If the box fails, you ship it back to Shenzhen at your cost. The same caveat applies to ASUS, HP, and the other OEMs shipping Strix Halo laptops — they are available through Parallel Importers and some laptop specialists, but the 128GB configuration is rare and the price premium is significant.

The power draw is 120W TDP under load, peaking at around 140W. It runs on a standard NZ 230V outlet. No special wiring. The cooling is active (a fan), and the unit is not silent under sustained AI load — think “quiet gaming laptop,” not “library.”

The realistic comparison for a New Zealand buyer who wants to run a frontier model locally this year: an M3 Ultra Mac Studio with 256GB (when in stock) at roughly NZ$8,500–NZ$9,500, plus a six-month wait. An EVO-X2 with 128GB at NZ$3,800–NZ$4,500, in stock this week, with 110GB usable for model weights on Linux. The Mac Studio is faster and quieter. The EVO-X2 is half the price, available now, and the memory difference is mostly academic at the model sizes that actually run at usable speed.

The other thing worth knowing: the The AI Build-Out Is Killing Cheap Smartphones — A DRAM Memory Crisis story we ran in May is now hitting the local-AI segment directly. The same HBM allocation that pushed NVIDIA to cut RTX 50 production is the reason your next Mac Studio is delayed. The same shortage is the reason a $2,000 AMD box with 128GB of LPDDR5X is the most interesting piece of consumer AI hardware shipped in 2026.

⚠️ THE OTHER SIDE

Three honest caveats. First, the speed. The 235B Qwen3 on the EVO-X2 is a demonstration of capability, not a daily-driver experience. At 4 to 14 tokens per second on smaller models, the local box is competitive with — but not faster than — cloud APIs at the same model class. The gap closes only on the 20B models, where the EVO-X2 is meaningfully faster than the cloud per-token but you are giving up a model class. Second, the AMD software stack. ROCm on Linux works; ROCm on Windows is improving. Ollama integration is solid. The broader tool ecosystem — IDE plugins, agent frameworks, RAG pipelines, image generation, audio — is less mature than the CUDA / macOS / cloud paths. You will spend more time configuring the box than you would setting up a Cursor subscription. Third, the 235B marketing. Lisa Su showed a 235B model generating text on stage. That is real. It is also misleading. Running a 235B model at 4 tok/s for 30 minutes is not the same experience as running a 235B model at 30 tok/s for 30 seconds. The capability is genuine. The framing is generous.

❓ FREQUENTLY ASKED QUESTIONS

Can a $2,000 mini PC really replace a $440/month AI subscription stack? No. A subscription to Claude Code Max and ChatGPT Pro includes models that are larger and more capable than anything the EVO-X2 can run at usable speed. The box is a supplement, not a replacement. At any subscription level above $80/month, it breaks even in two years on inference cost, even if you keep the cloud tools for the things the local box cannot do.

How fast is the AMD box compared to a Mac Studio M3 Ultra with 256GB? The M3 Ultra has higher memory bandwidth and faster token generation at every model size. It is also silent, runs cool, and integrates with the Apple software stack. The EVO-X2 is half the price, available now, and the memory difference (192GB Apple vs 128GB AMD) does not matter at the model sizes that actually run at usable speed on either box.

Why can’t Apple or NVIDIA just build this? The Mac Studio M5 Ultra is delayed to October 2026 because of the same DRAM and NAND shortage affecting every PC maker. NVIDIA cut GeForce RTX 50-series production 30–40% in the first half of 2026 to redirect HBM to AI data centre customers. AMD’s 395 sidesteps the bottleneck by using LPDDR5X, not HBM — slower, but available.

Is the 235B model claim honest? The capability is real. Nish Tahir’s third-party benchmarks show usable speed only at the 20B model class (24 tok/s). The 120B class runs at 15 tok/s. The 32B class runs at 4 tok/s. The 235B demonstration was likely a low-bit quantisation, short-context run — a “this is possible” moment, not a daily-driver experience.

Where can a New Zealand buyer actually get one? AliExpress and Amazon, shipped from Shenzhen or Hong Kong. Standard shipping to New Zealand is seven to fourteen days. Expect to pay NZ$3,800–NZ$4,500 landed for the 128GB model, including 15% GST and customs clearance. There is no local distributor; warranty is one year, return-to-seller.

Should I wait for the M5 Ultra Mac Studio instead? If you need the Apple software stack, the silence, and the highest possible token generation speed on the largest possible models, yes — and budget for October 2026 at the earliest, with pricing likely to start above NZ$8,000 for the 256GB configuration. If you want to run a 20B model locally this week, the EVO-X2 is the better buy.

Will the AMD box get faster? Almost certainly. The current third-party benchmarks are from October 2025, on an early software stack. AMD’s ROCm driver maturity for the 395 is improving, Ollama and llama.cpp are getting regular kernel-level optimisations for Strix Halo, and the model’s quantisation techniques are getting better. The 235B speed today (essentially unusable) is not the 235B speed in six months.

Is this the end of the cloud AI stack? No. The frontier still lives in the data centre, and the gap between a 235B local model and a 500B cloud model is meaningful. But the consumer/local AI segment — the one Apple and NVIDIA have been slow to serve — has a credible, available, affordable champion. For the first time in 2026, the question is not “when will the Mac Studio arrive.” It is “do I need to wait for it at all.”