Liquid AI has dropped LFM2.5-230M, a 230 million parameter model released June 25, 2026 as an open-weight download on Hugging Face. It runs at 213 tokens per second on a Samsung Galaxy S25 Ultra and 42 tok/s on a Raspberry Pi 5. At 230M parameters, it outperforms models more than twice its size on tool use benchmarks — BFCLv3 score of 43.26 — and has already been deployed on a Unitree G1 humanoid robot for natural-language skill control.
🔍 THE BOTTOM LINE
The edge AI future isn’t coming — it’s here, and it fits in your pocket. A 230M parameter model that runs on a Raspberry Pi, controls a humanoid robot, and beats 800M-parameter rivals on tool use is a signal that the frontier isn’t just about getting bigger. It’s about getting smarter with less.
What Changed: The Liquid Approach
What sets LFM2.5 apart isn’t just its size — it’s the architecture. While most competitors still rely on transformer stacks, Liquid AI uses their proprietary “Liquid Foundation Models” (LFM), a fundamentally different architecture that achieves dramatically higher throughput per parameter. The model was pre-trained on 19 trillion tokens with a 32K context window, giving it serious memory recall for its footprint.
The weights are available immediately on Hugging Face, and Liquid AI shipped day-one support across the entire inference ecosystem: llama.cpp, MLX for Apple Silicon, vLLM and SGLang for GPU serving, and ONNX for cross-platform deployment. Full details are in the Liquid AI documentation. That breadth of support is a deliberate play for adoption — they want this model running everywhere, not just in a lab.
The Robot Angle: From Chat Window to Physical World
The most compelling demonstration of LFM2.5-230M came from its deployment on a Unitree G1 humanoid robot running on an NVIDIA Jetson Orin. The model acts as a skill-selection layer: you give it a natural-language instruction like “hold still for 2 seconds, then walk forward at 1 meter per second for 3 meters, hold a forward one-leg kneel for 5 seconds, and walk backward at 0.5 meters per second for 3 meters” — and it decomposes that into a structured sequence of tool calls invoking pre-trained motor skills from NVIDIA’s SONIC framework.
This is the same family of edge AI modules powering NVIDIA’s humanoid robot foundation model work. The difference: where NVIDIA’s model is a large-scale foundation model for robot perception, Liquid AI’s 230M model is the lightweight control interface that sits on top — the translation layer between “what I want the robot to do” and the actual motor commands.
Why 230M Matters: Efficiency Over Brute Force
The benchmark numbers tell the story: GPQA Diamond 25.41, MMLU-Pro 20.25, IFEval 71.71, BFCLv3 43.26. Translation: it won’t win a trivia contest, but it follows instructions like a champion and plays nicely with external tools. It beats IBM Granite 4.0-H-350M (22.32 GPQA, 61.27 IFEval) and Gemma 3 1B IT (23.89 GPQA, 63.49 IFEval) on instruction following — despite being a quarter to half the size.
Liquid AI is explicit about what this model is NOT for: advanced math, code generation, or creative writing. It’s a tool-use and data-extraction specialist. That honesty is refreshing in a market where every model launch claims to be good at everything. For the extreme other end of the efficiency spectrum, see PrismML’s 1-bit edge AI — 8B parameters in 1.15 GB. Different trade-off, same trajectory.
NZ Angle: AI That Works When the Fibre Cuts Out
For New Zealand, where connectivity outside the main centres can be patchy, the ability to run capable AI models locally isn’t a luxury — it’s a necessity. The Raspberry Pi 5 numbers matter here: 42 tok/s on an $80 board means a school robotics club in Greymouth can run real AI inference without a cloud subscription. A rural agricultural monitoring system can process sensor data on-device without sending it to a server in Sydney.
This democratisation of AI processing is particularly relevant for NZ’s distributed economy. Small businesses and research groups in regional areas aren’t held hostage by expensive cloud API calls. A 230M model running at 42 tok/s on consumer hardware is exactly the kind of resilience the Kiwi DIY ethic demands.
The Competition: Small Models, Big Stakes
LFM2.5-230M enters a crowded field. IBM Granite 4.0-H-350M, Qwen3.5-0.8B, and Gemma 3 1B IT are all chasing the same efficiency frontier. Liquid AI’s edge: open weights with no usage restrictions, the LFM architecture’s proven throughput advantage on CPU, and demonstrated real-world deployment on a humanoid robot. For a closer look at the Qwen family at a larger scale, see Qwen 3 local dev model.
The positioning is clear: when deployment constraints — power, memory, latency — matter more than academic benchmark wins, LFM2.5-230M is the model you reach for.
❓ FAQ
Q: Is LFM2.5-230M suitable for complex reasoning tasks? A: No. Liquid AI explicitly advises against using it for heavy reasoning like advanced math, code generation, or creative writing. Its strength is structured tool use, instruction following, and data extraction.
Q: What does “open-weight” mean for developers? A: The model weights are freely available on Hugging Face. You can download, fine-tune, and deploy without commercial licences or API costs.
Q: How fast is it really? A: 213 tokens/second on a Samsung Galaxy S25 Ultra and 42 tok/s on a Raspberry Pi 5 — both in real-time-interaction territory.
Q: Can I actually use this on a robot? A: Yes. Liquid AI demonstrated it on a Unitree G1 humanoid via an NVIDIA Jetson Orin, where it handled natural-language skill control for complex motor commands.
Q: How does it compare to Qwen3.5 or Granite? A: It beats both on instruction following (IFEval: 71.71 vs 59.94 and 61.27) and tool use (BFCLv3: 43.26 vs 35.08 and 43.07), despite being smaller than both.
🔍 THE BOTTOM LINE
LFM2.5-230M is a statement piece from Liquid AI: efficiency and accessibility are the new frontiers of AI deployment. For Kiwi innovators building physical products or needing reliable local processing power, this model represents a genuine leap — capable AI that’s practical, affordable, and truly edge-native. The question isn’t whether small models will matter. It’s how fast the frontier shrinks.