AI Model Pricing Guide
Because tokens cost money and you're not made of it | Updated May 09, 2026
CHEAPEST: Qwen 3.6 Plus ($0.10/$0.30) / Llama 4 Scout ($0.15/$0.55) / GPT-5 nano ($0.05 in)BEST VALUE: Claude Sonnet 4.6 ($3/$15) / Grok 4.3 ($1.25/$2.50)SMARTEST: GPT-5.5 Pro ($30/$180) / GPT-5.5 ($5/$30) — #1 across benchmarks⚠️ RETIRING MAY 15: Grok 4 / 4.1 Fast / Code Fast 1 → Migrate to Grok 4.3
O
OpenAI
The OG of AI APIs. GPT kicked off the revolution and they're still leading.
BUDGET
GPT-5 mini
Fast & cheap
In
$0.25
per 1M
Out
$2.00
per 1M
128K ctxFast
Lightweight champion. Surprisingly capable for simple tasks and high-volume apps.
Best: Chatbots, simple QA, data extraction
BUDGET
o4-mini
Reinforcement tuned
In
$1.10
per 1M
Out
$4.40
per 1M
200K ctxFine-tuning
Price dropped 70%. Optimized for reinforcement fine-tuning workflows. Create custom reasoning patterns.
Best: Fine-tuning, custom reasoning
NEW
GPT-5.4 mini
Coding & subagents
In
$0.75
per 1M
Out
$4.50
per 1M
Cached inputCoding
New GPT-5.4-class model. Stronger than GPT-5 mini for coding and subagent workflows.
Best: Coding, subagents, mid-tier apps
NEW
GPT-5.4 nano
Cheapest 5.4-class
In
$0.20
per 1M
Out
$1.25
per 1M
Cached inputBudget
Cheapest way into the GPT-5.4 family. Cheaper input than GPT-5 mini.
Best: High-volume, budget apps
POWER
GPT-5.4
Still elite — now #2
In
$2.50
per 1M
Out
$15.00
per 1M
270K ctxReasoning
OpenAI's previous #1. Still elite for complex, multi-step problems.
Best: Hardest problems, professional work
POWER
GPT-5.2
Reasoning beast
In
$1.75
per 1M
Out
$14.00
per 1M
6.6h horizon200K ctx
Top 3 on METR. Excels at complex tasks, code, and multi-step reasoning.
Best: Code, analysis, agent workflows
POWER
GPT-5.2 Pro
Reasoning premium
In
$21.00
per 1M
Out
$168.00
per 1M
200K ctxPremium
OpenAI's most precise reasoning model. For when you need the absolute best reasoning.
Best: Hardest problems, precision work
#1 RANKED
GPT-5.5
New #1 — smarter, faster, cheaper
In
$5.00
per 1M
Out
$30.00
per 1M
1.05M ctxReasoningAgents
New #1 across benchmarks. 82.7% Terminal-Bench, 84.9% GDPval, 78.7% OSWorld. 1.05M context window. Beats GPT-5.4 and Opus 4.7 everywhere while using fewer tokens.
Best: Coding, agents, research, multi-step tasks — the new standard
#1 RANKED
GPT-5.5 Pro
Maximum intelligence
In
$30.00
per 1M
Out
$180.00
per 1M
1.05M ctxPremiumDeep Research
90.1% BrowseComp, 52.4% FrontierMath Tier 1-3. 1.05M context window — read entire codebases and research libraries. The ceiling for what AI can do right now.
Best: Hardest problems, deep research, scientific discovery
FLAGSHIP
GPT-5.4 Pro
Previous premium — now #3
In
$30.00
per 1M
Out
$180.00
per 1M
270K ctxPremium
Former #1, now behind GPT-5.5. Still incredibly powerful for demanding tasks.
Best: Most demanding tasks, unlimited budget
A
Anthropic
Safety-first company. Claude is beloved by developers for being genuinely helpful.
FAST
Claude Haiku 4.5
Speed demon
In
$1.00
per 1M
Out
$5.00
per 1M
200K ctxFastest
Optimized for fast responses. Perfect for real-time apps and bulk processing.
Best: Real-time chat, bulk processing
BEST
Claude Sonnet 4.6
The sweet spot
In
$3.00
per 1M
Out
$15.00
per 1M
Balanced200K ctx
Best balance of speed and smarts. Most developers find this is all they need.
Best: Most tasks, code, writing, general use
NEW
Claude Opus 4.7
New SOTA
In
$5.00
per 1M
Out
$25.00
per 1M
1M ctxxhigh reasoningSelf-verify
Anthropic's best. 1M context, autonomous self-verification. Beat GPT-5.4 on BrowseComp but overtaken by GPT-5.5 on coding and agents.
Best: Complex coding, agents, long-horizon tasks
POWER
Claude Opus 4.6
Proven workhorse
In
$5.00
per 1M
Out
$25.00
per 1M
14.5h horizon200K ctxFast mode
Still one of the best. 14+ hour autonomous tasks. Reliable, consistent, now the value play vs 4.7.
Best: Hard problems, research, complex agents
NEW
Claude Mythos Preview
Frontier intelligence
In
$25.00
per 1M
Out
$125.00
per 1M
1M ctxPreview
Anthropic's new frontier tier. 5x Opus 4.7 pricing. Invitation-only through Project Glasswing cybersecurity initiative. Found thousands of zero-days pre-release.
Best: Cybersecurity, frontier research, if you can get access
NEW
Grok 4.20
Same price as 4.3, more features
In
$1.25
per 1M
Out
$2.50
per 1M
2M ctxReasoningMulti-agentVision
Same pricing as Grok 4.3 with multi-agent orchestration. Cached input at $0.125/1M. 2M context window for complex agent swarms.
Best: Complex multi-agent workflows
NEW
Grok 4.3
New recommended base model
In
$1.25
per 1M
Out
$2.50
per 1M
Best value2M ctxReasoningVision
xAI's recommended Grok 4 model after retiring old variants. Beats Grok 4.1 on coding, agents, and reasoning. This is the migration target for retiring models.
Best: High-volume apps, X analysis, multi-agent — the new default Grok
RETIRING MAY 15
Grok 4 / 4.1 Fast
⚠️ Will stop working May 15
In
$0.20
per 1M
Out
$0.50
per 1M
2M ctxRetiring
Being retired May 15. Migrate to Grok 4.3 ($1.25/$2.50) for better performance, or Grok 4.20 ($1.25/$2.50) for multi-agent. Both are far more capable.
Best: → Migrate to: Grok 4.3
RETIRING MAY 15
Grok Code Fast 1
⚠️ Will stop working May 15
In
$0.20
per 1M
Out
$1.50
per 1M
256K ctxRetiring
Being retired May 15. Migrate to Grok 4.3 ($1.25/$2.50) or Grok 4.20 ($1.25/$2.50) for coding — both handle code well.
Best: → Migrate to: Grok 4.3
BUDGET
Grok 3 Mini
Older gen cheap
In
$0.30
per 1M
Out
$0.50
per 1M
131K ctxReasoning
Budget fallback if Grok 4's 2M context is overkill for your use case.
Best: Simple tasks, testing
POWER
Grok 4-0709
Premium tier
In
$3.00
per 1M
Out
$15.00
per 1M
256K ctxReasoningVision
Premium Grok. Smaller context but more reasoning power.
Best: Grok style with more smarts
G
Google DeepMind
Gemini has quietly become excellent. Massive context, strong multimodal, and a generous free tier.
NEW
Gemini 3.1 Flash-Lite
Cheapest Gemini 3
In
$0.25
per 1M
Out
$1.50
per 1M
PreviewBudget
Cheapest way into Gemini 3.1. Preview tier with budget-friendly pricing.
Best: Budget Gemini 3 apps, prototyping
NEW
Gemini 3 Flash
New budget
In
$0.50
per 1M
Out
$3.00
per 1M
PreviewFast
Gemini 3 Flash preview. Balanced performance at budget pricing.
Best: Budget apps, prototyping
VALUE
Gemini 2.5 Flash
Best value
In
$0.30
per 1M
Out
$2.50
per 1M
1M ctxMultimodalFree tier
Cheapest way to process 1M context. Free tier available. Multimodal - images, video, audio.
Best: High-volume, multimodal, prototypes
NEW
Gemini 2.5 Flash-Lite
Ultra-cheap Flash
In
$0.10
per 1M
Out
$0.40
per 1M
1M ctxBudget
Flash-Lite tier for Gemini 2.5. Cheaper than standard Flash with 1M context support. Best for high-volume simple tasks.
Best: High-volume, simple tasks, cost-sensitive apps
Gemini 3 Pro
3rd gen flagship
In
$2.00
per 1M
Out
$12.00
per 1M
1M ctx
Third-generation Gemini Pro. Now stable — no Preview tag. Same / pricing as preview tier. Strong general-purpose flagship.
Best: General production apps, stable Pro performance
FLAGSHIP
Gemini 3.1 Pro Preview
New flagship
In
$2.00
per 1M
Out
$12.00
per 1M
4h horizonPreviewVideo
77.1% ARC-AGI-2. Price increased from $1.25/$10. Batch and Flex tiers at 50% off.
Best: Video analysis, complex reasoning
LONG
Gemini 2.5 Pro
Long outputs
In
$1.25
per 1M
Out
$10.00
per 1M
1M ctx64K output
Same price as 3.1 Pro but 64K max output vs 16K. Choose for long-form content generation.
Best: Long-form writing, large outputs
DEPRECATED
Gemini 2.0 Flash
Shuts down Jun 1
In
$0.15
per 1M
Out
$0.60
per 1M
1M ctx8K outputRetiring Jun 1
Deprecated — shuts down June 1, 2026. Migrate to Gemini 2.5 Flash or 3.1 Flash-Lite.
Best: Migrate away from this model
⬆
Open Source & Local
Open-weight models you can run yourself or call via cheap APIs. The frontier is no longer closed.
NEW
Kimi K2.6
88% cheaper than Opus
In
$0.60
per 1M
Out
$2.50
per 1M
256K ctxOpen weightMoE 1T/32B
Beats GPT-5.4 and Opus 4.6 on SWE-Bench Pro. 1T params, 32B active. 300 sub-agent orchestration. OpenAI-compatible API.
Best: Coding, agents, long-horizon tasks
CHEAPEST
Qwen 3.6 Plus
1M context, free tier
In
$0.10
per 1M
Out
$0.30
per 1M
1M ctxReasoningFree tier
Alibaba's latest. Mandatory chain-of-thought reasoning. Free tier available. Topped 6 coding benchmarks on release.
Best: Budget coding, massive context
NEW
Llama 4 Scout
10M context MoE
In
$0.15
per 1M
Out
$0.55
per 1M
10M ctxOpen weightMoE 109B
Longest context of any open model. 109B total, 17B active. Multimodal. Runs on 24GB VRAM.
Best: Massive context, multimodal, local
Llama 4 Maverick
Frontier coding MoE
In
$0.20
per 1M
Out
$0.80
per 1M
1M ctxOpen weightMoE 400B
Beats GPT-4o on coding. 400B total, 17B active. 128 experts. Frontier quality at MoE prices.
Best: Coding, complex reasoning
DeepSeek V3.2
Matches GPT-4o
In
$0.27
per 1M
Out
$1.10
per 1M
128K ctxOpen weightMoE 685B
94.2% MMLU matching GPT-4o. 685B MoE with 37B active. Best open model for general knowledge.
Best: General knowledge, research
FREE
Qwen3-Coder 8B
Local coding king
In
$0.00
per 1M
Out
$0.00
per 1M
In
FREE
local
Out
FREE
local
32K ctxLocal only8B dense
Runs on any 8GB GPU. 92 programming languages. 80-150 tok/s. Best local coding model under 10B. Set it up locally →
Best: Local coding, autocomplete
FREE
DeepSeek R1 Distill 14B
Local reasoning
In
$0.00
per 1M
Out
$0.00
per 1M
In
FREE
local
Out
FREE
local
Local onlyReasoning10GB VRAM
Chain-of-thought reasoning on 10GB VRAM. The sweet spot for local reasoning. 55 tok/s on modern GPUs. Run it offline →
Best: Local reasoning, budget hardware
💡 Did You Know?
1M tokens = 750K words
That's roughly 1,500 pages. Process it for $0.05 with GPT-5 nano — the cheapest input ever.
Grok vs Opus
Send 50x more output tokens through Grok 4.1 Fast for the same price as Opus 4.7 output. $0.50 vs $25.
Gemini 3.1 Pro price hike
Gemini 3.1 Pro jumped from $1.25/$10 to $2.00/$12. No longer matches GPT-5 pricing — now more expensive.
o4-mini price crash
o4-mini dropped from $4/$16 to $1.10/$4.40 — a 70% price cut. Now genuinely competitive with Haiku.