NVIDIA is offering around 80 AI models through hosted APIs — and they’re completely free. MiniMax M2.7, GLM 5.1, Kimi 2.5, DeepSeek 3.2, GPT-OSS-120B, Sarvam-M, and dozens more. All accessible through standard OpenAI-compatible endpoints. All at zero cost.
🔍 THE BOTTOM LINE: NVIDIA’s NIM platform gives you free inference on frontier models that would otherwise cost hundreds of dollars a month through their original providers. If you’re paying for API access to any of these models, you might not need to be.
🚀 What’s Available
The free tier on build.nvidia.com includes:
Frontier LLMs:
- MiniMax M2.7 — 230B parameters, excels at coding and agentic workflows
- GLM 5.1 — Zhipu AI’s latest reasoning model
- Kimi 2.5 — Moonshot’s flagship model
- DeepSeek 3.2 — The latest from DeepSeek’s series
- GPT-OSS-120B — Open-source large model
- Sarvam-M — Indian multilingual model
Specialised models:
- Embedding models, reranking models, retrieval models, and vision models are all in the catalogue
The full list runs to 80+ models covering text generation, vision, retrieval, and more.
🔧 How to Set It Up
Three lines of configuration:
base_url = "https://integrate.api.nvidia.com/v1"
api_key = "$NVIDIA_API_KEY"
model = "minimaxai/minimax-m2.7"
Step 1: Go to build.nvidia.com/models and create a free account.
Step 2: Generate an API key (no credit card required).
Step 3: Point your tool at the NVIDIA endpoint. It works with anything that supports OpenAI-compatible APIs.
Works with:
- OpenClaw — set
base_urlin your provider config - Cursor IDE — add as a custom OpenAI-compatible provider
- Zed IDE — configure in settings
- Continue — add to
config.json - Any OpenAI-compatible client — just swap the base URL
💰 Why This Matters
Running MiniMax M2.7 locally would need multiple H100s. DeepSeek 3.2 requires serious GPU memory. Kimi 2.5 isn’t available for self-hosting at all.
NVIDIA is eating the inference cost. For now.
The free tier has rate limits — it’s designed for development and experimentation, not production workloads. But for building, testing, prototyping, and personal projects, it’s genuinely free inference on models that would otherwise cost real money.
🇳🇿 New Zealand Angle
For NZ developers and startups, this is significant. API costs add up fast when you’re paying in USD from NZD. Free inference on frontier models means:
- Prototyping without the bill — test ideas before committing to paid API access
- Education — students and learners can access cutting-edge models for free
- Side projects — build without watching your credit card drain
If you’re running OpenClaw locally (see our offline LLM setup guide), you can use NVIDIA’s free tier for the heavy models and fall back to local inference for lighter tasks.
⚠️ The Catch
There is always a catch:
- Rate limits — The free tier is throttled. You’ll hit ceilings if you try to run production traffic through it.
- Data handling — NVIDIA’s terms govern what happens to your prompts and outputs. Read the fine print before sending sensitive data.
- This is a land grab — NVIDIA wants developers building on their platform. The free tier is the hook. When they eventually charge, you’ll already be integrated.
- Availability — Models can be added or removed. What’s free today may not be tomorrow.
Use it. Build on it. But don’t build your entire business dependency on it.