AI Model Pricing Guide

Because tokens cost money and you're not made of it | Updated May 09, 2026

⚠️ xAI RETIREMENT WARNING: Grok 4 / 4.1 Fast / Code Fast 1 retiring May 15, 2026 (5 days away!). Migrate to Grok 4.3 ($1.25/$2.50) — same price, better performance. Details below.
CHEAPEST: Qwen 3.6 Plus ($0.10/$0.30) / Llama 4 Scout ($0.15/$0.55) / GPT-5 nano ($0.05 in)BEST VALUE: Claude Sonnet 4.6 ($3/$15) / Grok 4.3 ($1.25/$2.50)SMARTEST: GPT-5.5 Pro ($30/$180) / GPT-5.5 ($5/$30) — #1 across benchmarks⚠️ RETIRING MAY 15: Grok 4 / 4.1 Fast / Code Fast 1 → Migrate to Grok 4.3
O

OpenAI

The OG of AI APIs. GPT kicked off the revolution and they're still leading.

BUDGET
GPT-5 mini
Fast & cheap
In
$0.25
per 1M
Out
$2.00
per 1M
128K ctxFast
Lightweight champion. Surprisingly capable for simple tasks and high-volume apps.
Best: Chatbots, simple QA, data extraction
BUDGET
o4-mini
Reinforcement tuned
In
$1.10
per 1M
Out
$4.40
per 1M
200K ctxFine-tuning
Price dropped 70%. Optimized for reinforcement fine-tuning workflows. Create custom reasoning patterns.
Best: Fine-tuning, custom reasoning
NEW
GPT-5.4 mini
Coding & subagents
In
$0.75
per 1M
Out
$4.50
per 1M
Cached inputCoding
New GPT-5.4-class model. Stronger than GPT-5 mini for coding and subagent workflows.
Best: Coding, subagents, mid-tier apps
NEW
GPT-5.4 nano
Cheapest 5.4-class
In
$0.20
per 1M
Out
$1.25
per 1M
Cached inputBudget
Cheapest way into the GPT-5.4 family. Cheaper input than GPT-5 mini.
Best: High-volume, budget apps
POWER
GPT-5.2
Reasoning beast
In
$1.75
per 1M
Out
$14.00
per 1M
6.6h horizon200K ctx
Top 3 on METR. Excels at complex tasks, code, and multi-step reasoning.
Best: Code, analysis, agent workflows
POWER
GPT-5.2 Pro
Reasoning premium
In
$21.00
per 1M
Out
$168.00
per 1M
200K ctxPremium
OpenAI's most precise reasoning model. For when you need the absolute best reasoning.
Best: Hardest problems, precision work
FLAGSHIP
GPT-5.4 Pro
Previous premium — now #3
In
$30.00
per 1M
Out
$180.00
per 1M
270K ctxPremium
Former #1, now behind GPT-5.5. Still incredibly powerful for demanding tasks.
Best: Most demanding tasks, unlimited budget
A

Anthropic

Safety-first company. Claude is beloved by developers for being genuinely helpful.

FAST
Claude Haiku 4.5
Speed demon
In
$1.00
per 1M
Out
$5.00
per 1M
200K ctxFastest
Optimized for fast responses. Perfect for real-time apps and bulk processing.
Best: Real-time chat, bulk processing
POWER
Claude Opus 4.6
Proven workhorse
In
$5.00
per 1M
Out
$25.00
per 1M
14.5h horizon200K ctxFast mode
Still one of the best. 14+ hour autonomous tasks. Reliable, consistent, now the value play vs 4.7.
Best: Hard problems, research, complex agents
NEW
Claude Mythos Preview
Frontier intelligence
In
$25.00
per 1M
Out
$125.00
per 1M
1M ctxPreview
Anthropic's new frontier tier. 5x Opus 4.7 pricing. Invitation-only through Project Glasswing cybersecurity initiative. Found thousands of zero-days pre-release.
Best: Cybersecurity, frontier research, if you can get access
X

xAI

Elon's AI. Grok has real-time X data access and absurdly low pricing.

NEW
Grok 4.20
Same price as 4.3, more features
In
$1.25
per 1M
Out
$2.50
per 1M
2M ctxReasoningMulti-agentVision
Same pricing as Grok 4.3 with multi-agent orchestration. Cached input at $0.125/1M. 2M context window for complex agent swarms.
Best: Complex multi-agent workflows
RETIRING MAY 15
Grok 4 / 4.1 Fast
⚠️ Will stop working May 15
In
$0.20
per 1M
Out
$0.50
per 1M
2M ctxRetiring
Being retired May 15. Migrate to Grok 4.3 ($1.25/$2.50) for better performance, or Grok 4.20 ($1.25/$2.50) for multi-agent. Both are far more capable.
Best: → Migrate to: Grok 4.3
RETIRING MAY 15
Grok Code Fast 1
⚠️ Will stop working May 15
In
$0.20
per 1M
Out
$1.50
per 1M
256K ctxRetiring
Being retired May 15. Migrate to Grok 4.3 ($1.25/$2.50) or Grok 4.20 ($1.25/$2.50) for coding — both handle code well.
Best: → Migrate to: Grok 4.3
BUDGET
Grok 3 Mini
Older gen cheap
In
$0.30
per 1M
Out
$0.50
per 1M
131K ctxReasoning
Budget fallback if Grok 4's 2M context is overkill for your use case.
Best: Simple tasks, testing
POWER
Grok 4-0709
Premium tier
In
$3.00
per 1M
Out
$15.00
per 1M
256K ctxReasoningVision
Premium Grok. Smaller context but more reasoning power.
Best: Grok style with more smarts
G

Google DeepMind

Gemini has quietly become excellent. Massive context, strong multimodal, and a generous free tier.

NEW
Gemini 3.1 Flash-Lite
Cheapest Gemini 3
In
$0.25
per 1M
Out
$1.50
per 1M
PreviewBudget
Cheapest way into Gemini 3.1. Preview tier with budget-friendly pricing.
Best: Budget Gemini 3 apps, prototyping
NEW
Gemini 3 Flash
New budget
In
$0.50
per 1M
Out
$3.00
per 1M
PreviewFast
Gemini 3 Flash preview. Balanced performance at budget pricing.
Best: Budget apps, prototyping
VALUE
Gemini 2.5 Flash
Best value
In
$0.30
per 1M
Out
$2.50
per 1M
1M ctxMultimodalFree tier
Cheapest way to process 1M context. Free tier available. Multimodal - images, video, audio.
Best: High-volume, multimodal, prototypes
NEW
Gemini 2.5 Flash-Lite
Ultra-cheap Flash
In
$0.10
per 1M
Out
$0.40
per 1M
1M ctxBudget
Flash-Lite tier for Gemini 2.5. Cheaper than standard Flash with 1M context support. Best for high-volume simple tasks.
Best: High-volume, simple tasks, cost-sensitive apps
Gemini 3 Pro
3rd gen flagship
In
$2.00
per 1M
Out
$12.00
per 1M
1M ctx
Third-generation Gemini Pro. Now stable — no Preview tag. Same / pricing as preview tier. Strong general-purpose flagship.
Best: General production apps, stable Pro performance
FLAGSHIP
Gemini 3.1 Pro Preview
New flagship
In
$2.00
per 1M
Out
$12.00
per 1M
4h horizonPreviewVideo
77.1% ARC-AGI-2. Price increased from $1.25/$10. Batch and Flex tiers at 50% off.
Best: Video analysis, complex reasoning
LONG
Gemini 2.5 Pro
Long outputs
In
$1.25
per 1M
Out
$10.00
per 1M
1M ctx64K output
Same price as 3.1 Pro but 64K max output vs 16K. Choose for long-form content generation.
Best: Long-form writing, large outputs
DEPRECATED
Gemini 2.0 Flash
Shuts down Jun 1
In
$0.15
per 1M
Out
$0.60
per 1M
1M ctx8K outputRetiring Jun 1
Deprecated — shuts down June 1, 2026. Migrate to Gemini 2.5 Flash or 3.1 Flash-Lite.
Best: Migrate away from this model

Open Source & Local

Open-weight models you can run yourself or call via cheap APIs. The frontier is no longer closed.

NEW
Kimi K2.6
88% cheaper than Opus
In
$0.60
per 1M
Out
$2.50
per 1M
256K ctxOpen weightMoE 1T/32B
Beats GPT-5.4 and Opus 4.6 on SWE-Bench Pro. 1T params, 32B active. 300 sub-agent orchestration. OpenAI-compatible API.
Best: Coding, agents, long-horizon tasks
CHEAPEST
Qwen 3.6 Plus
1M context, free tier
In
$0.10
per 1M
Out
$0.30
per 1M
1M ctxReasoningFree tier
Alibaba's latest. Mandatory chain-of-thought reasoning. Free tier available. Topped 6 coding benchmarks on release.
Best: Budget coding, massive context
NEW
Llama 4 Scout
10M context MoE
In
$0.15
per 1M
Out
$0.55
per 1M
10M ctxOpen weightMoE 109B
Longest context of any open model. 109B total, 17B active. Multimodal. Runs on 24GB VRAM.
Best: Massive context, multimodal, local
Llama 4 Maverick
Frontier coding MoE
In
$0.20
per 1M
Out
$0.80
per 1M
1M ctxOpen weightMoE 400B
Beats GPT-4o on coding. 400B total, 17B active. 128 experts. Frontier quality at MoE prices.
Best: Coding, complex reasoning
DeepSeek V3.2
Matches GPT-4o
In
$0.27
per 1M
Out
$1.10
per 1M
128K ctxOpen weightMoE 685B
94.2% MMLU matching GPT-4o. 685B MoE with 37B active. Best open model for general knowledge.
Best: General knowledge, research
FREE
Qwen3-Coder 8B
Local coding king
In
$0.00
per 1M
Out
$0.00
per 1M
In
FREE
local
Out
FREE
local
32K ctxLocal only8B dense
Runs on any 8GB GPU. 92 programming languages. 80-150 tok/s. Best local coding model under 10B. Set it up locally →
Best: Local coding, autocomplete
FREE
DeepSeek R1 Distill 14B
Local reasoning
In
$0.00
per 1M
Out
$0.00
per 1M
In
FREE
local
Out
FREE
local
Local onlyReasoning10GB VRAM
Chain-of-thought reasoning on 10GB VRAM. The sweet spot for local reasoning. 55 tok/s on modern GPUs. Run it offline →
Best: Local reasoning, budget hardware
💡 Did You Know?
1M tokens = 750K words
That's roughly 1,500 pages. Process it for $0.05 with GPT-5 nano — the cheapest input ever.
Grok vs Opus
Send 50x more output tokens through Grok 4.1 Fast for the same price as Opus 4.7 output. $0.50 vs $25.
Gemini 3.1 Pro price hike
Gemini 3.1 Pro jumped from $1.25/$10 to $2.00/$12. No longer matches GPT-5 pricing — now more expensive.
o4-mini price crash
o4-mini dropped from $4/$16 to $1.10/$4.40 — a 70% price cut. Now genuinely competitive with Haiku.