On this page
- TL;DR
- How to read this guide
- The big three (frontier labs)
- OpenAI
- Anthropic
- Google (Gemini)
- The fast inference providers (open models, crazy speeds)
- Groq
- Together AI
- Fireworks AI
- The value providers (cheap but capable)
- DeepSeek
- Mistral
- The specialized players
- xAI (Grok)
- Cohere
- Perplexity
- OpenRouter
- Real-world cost for common tasks
- Task: Short chat message (200 in, 400 out)
- Task: Long document summary (50k in, 2k out)
- Task: Bulk classification (1,000 items × 500 in × 50 out)
- Free tiers worth knowing about
- What you should actually use
- The thing this table doesn't show
- Keep this up to date
TL;DR
- The gap between the cheapest and most expensive flagship model is now 300x — Gemini 2.0 Flash at $0.10/M tokens vs. Claude Opus 4 at $75/M output.
- For 80% of production workloads, Claude Sonnet 4.6 or GPT-4o-mini hit the quality/price sweet spot.
- Open-source models on Groq and Together cost pennies for speeds that were hyperscaler-only 18 months ago.
- This guide lists every major provider with current prices (February 2026), their quirks, and what each is best for.
Bookmark this post — we keep it updated. The live, always-current pricing is at /price-tracker.
How to read this guide
Every provider has its own pricing model, but the dominant pattern is $ per 1 million tokens, split into input (what you send to the model) and output (what the model generates). Output tokens are usually 3-5x more expensive than input tokens.
A token ≈ 4 characters of English ≈ 0.75 words. A typical chat message is 150-300 tokens. A long document might be 10,000+ tokens.
All prices below are as of February 2026 and quoted in USD per 1 million tokens.
The big three (frontier labs)
OpenAI
| Model | Input | Output | Context | Best for |
|---|---|---|---|---|
| GPT-5 | $5.00 | $15.00 | 256k | Hardest reasoning, long planning |
| GPT-4o | $2.50 | $10.00 | 128k | General-purpose workhorse |
| GPT-4o-mini | $0.15 | $0.60 | 128k | Bulk tasks, fast + cheap |
| o3 | $15.00 | $60.00 | 200k | Expensive deep reasoning |
| o3-mini | $1.10 | $4.40 | 200k | Good reasoning at a reasonable price |
| GPT-Image-1 | Per image | — | — | Image generation ($0.04/std, $0.17/HD) |
Notes: OpenAI offers prompt caching (50% discount on repeated input tokens) and batch API (50% off, async). Free tier is limited; you need to add at least $5 credit to unlock higher tier rate limits.
Best for: Anything where quality and ecosystem support matter. OpenAI has the most mature tooling — function calling, structured outputs, assistants API.
Anthropic
| Model | Input | Output | Context | Best for |
|---|---|---|---|---|
| Claude Opus 4 | $15.00 | $75.00 | 200k (1M w/ beta) | Best coding model, hardest tasks |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 200k | Sweet spot: Opus quality at 5x less cost |
| Claude Haiku 3.5 | $0.80 | $4.00 | 200k | Fast, cheap, still surprisingly good |
Notes: Anthropic offers prompt caching (up to 90% discount after the first cached request). The 1M context variant of Opus/Sonnet costs 2x the base rates. No image generation — Claude is text-and-vision-input only.
Best for: Coding, long-document reasoning, agentic workflows, instruction-following.
Google (Gemini)
| Model | Input | Output | Context | Best for |
|---|---|---|---|---|
| Gemini 2.5 Pro | $1.25 | $10.00 | 2M | Massive context, research, long video |
| Gemini 2.0 Flash | $0.10 | $0.40 | 1M | Cheap workhorse for huge context |
| Gemini 2.0 Flash-Lite | $0.05 | $0.20 | 1M | Dirt-cheap batch workloads |
Notes: Gemini's context window is its superpower — 2M tokens on 2.5 Pro is unmatched. Free tier on AI Studio is generous but uses your data for training. Paid/Vertex does not.
Best for: Massive documents, long codebases, video understanding, multimodal tasks.
The fast inference providers (open models, crazy speeds)
Groq
| Model | Input | Output | Speed | Context |
|---|---|---|---|---|
| Llama 3.3 70B | $0.59 | $0.79 | ~300 tok/s | 128k |
| Llama 3.1 8B | $0.05 | $0.08 | ~750 tok/s | 128k |
| DeepSeek R1 Distill | $0.75 | $0.99 | ~200 tok/s | 128k |
| Mixtral 8x7B | $0.24 | $0.24 | ~500 tok/s | 32k |
Notes: Groq runs open-source models on custom LPU hardware. Output tokens per second are typically 5-10x faster than GPU-based hosts. Rate limits are per-model and can be hit during peak hours.
Best for: Real-time chat where perceived speed matters, voice applications, streaming UX.
Together AI
| Model | Input | Output | Context |
|---|---|---|---|
| Llama 3.3 70B Turbo | $0.88 | $0.88 | 128k |
| Llama 3.1 405B Turbo | $3.50 | $3.50 | 128k |
| DeepSeek V3 | $1.25 | $1.25 | 64k |
| Qwen 2.5 72B | $1.20 | $1.20 | 32k |
Notes: Together offers the widest selection of open-source models, including specialized coding and math variants. Dedicated endpoints available for enterprise.
Best for: Open-source model experimentation, specialized models (coder variants, math variants), self-deployed workflows.
Fireworks AI
| Model | Input | Output | Notes |
|---|---|---|---|
| Llama 3.3 70B | $0.90 | $0.90 | Fast inference |
| Mixtral 8x22B | $1.20 | $1.20 | High-quality MoE |
| Fireworks-tuned custom | Per config | Per config | Fine-tuning platform |
Best for: Fine-tuning your own model variants, production OSS serving with SLAs.
The value providers (cheap but capable)
DeepSeek
| Model | Input | Output | Context |
|---|---|---|---|
| DeepSeek V3 | $0.27 (off-peak $0.07) | $1.10 (off-peak $0.27) | 64k |
| DeepSeek R1 | $0.55 (off-peak $0.14) | $2.19 (off-peak $0.55) | 64k |
Notes: Off-peak discount windows (roughly 16:30-00:30 UTC) cut prices 75%. Quality genuinely competes with GPT-4o on most benchmarks at a fraction of the price. Data sovereignty concerns for some enterprise users (servers in China).
Best for: Cost-sensitive production workloads, research, non-sensitive consumer products.
Mistral
| Model | Input | Output | Context |
|---|---|---|---|
| Mistral Large 2 | $2.00 | $6.00 | 128k |
| Mistral Medium | $0.40 | $2.00 | 32k |
| Mistral Small | $0.20 | $0.60 | 32k |
| Codestral | $0.30 | $0.90 | 32k |
Notes: European-based, strong GDPR posture. Codestral is specifically tuned for code. Does not train on API data.
Best for: EU companies with data residency requirements, code-focused workflows, no-China / no-US alternatives.
The specialized players
xAI (Grok)
| Model | Input | Output | Context |
|---|---|---|---|
| Grok 3 | $5.00 | $15.00 | 128k |
| Grok 3 Mini | $0.30 | $0.50 | 128k |
Best for: Real-time Twitter/X data access (the one thing it has that others don't).
Cohere
| Model | Input | Output | Context |
|---|---|---|---|
| Command R+ | $2.50 | $10.00 | 128k |
| Command R | $0.15 | $0.60 | 128k |
| Embed v3 | $0.10 | — | Embeddings |
Best for: Retrieval-heavy workloads, RAG pipelines, enterprise search.
Perplexity
| Model | Input | Output | Notes |
|---|---|---|---|
| Sonar Pro | $3.00 | $15.00 | Built-in web search |
| Sonar | $1.00 | $1.00 | Faster web-grounded |
Best for: Applications that need web search grounding baked in. Non-standard pricing includes per-request search fee.
OpenRouter
OpenRouter is not a model provider — it's a routing layer that gives you one API to access most of the models listed above. Prices are the provider's price plus a small margin (typically 0-5%). Great if you don't want to manage multiple API keys, though BYOK with direct provider keys is usually cheaper.
Real-world cost for common tasks
Here's what common tasks actually cost across providers. All assume one full round-trip (prompt + response).
Task: Short chat message (~200 in, ~400 out)
| Model | Cost | Feels like |
|---|---|---|
| GPT-4o-mini | $0.0003 | Essentially free |
| Claude Haiku 3.5 | $0.0018 | Essentially free |
| Claude Sonnet 4.6 | $0.0066 | ¢ per dozen messages |
| GPT-4o | $0.0045 | ¢ per dozen messages |
| Claude Opus 4 | $0.0330 | ~3¢ per message |
| GPT-5 | $0.0070 | < 1¢ per message |
Task: Long document summary (~50k in, ~2k out)
| Model | Cost |
|---|---|
| Gemini 2.0 Flash | $0.006 |
| GPT-4o-mini | $0.009 |
| Claude Sonnet 4.6 | $0.180 |
| GPT-4o | $0.145 |
| Claude Opus 4 | $0.900 |
Task: Bulk classification (1,000 items × 500 in × 50 out)
| Model | Cost |
|---|---|
| Gemini 2.0 Flash-Lite | $0.030 |
| GPT-4o-mini | $0.105 |
| Claude Haiku 3.5 | $0.600 |
Free tiers worth knowing about
- Google AI Studio: Generous free tier. Catch: trains on your data. Use for experimentation, not production user data.
- Groq: Free tier with rate limits. Great for learning, prototyping.
- Mistral: "La Plateforme" has a free tier for experimentation.
- Together: $5 free credit for new accounts, plus a free tier on smaller models.
- OpenRouter: Many free-tier endpoints for open-source models.
What you should actually use
If we had to pick one default for each category:
- Best all-rounder: Claude Sonnet 4.6. Good at everything, costs what one decent coffee costs per week for heavy use.
- Best cheap default: GPT-4o-mini. Near-free, great quality.
- Best quality-regardless-of-price: Claude Opus 4 for code, GPT-5 for reasoning.
- Best for huge documents: Gemini 2.5 Pro (2M context).
- Best for speed: Groq + Llama 3.3 70B (~300 tokens/sec).
- Best for privacy-conscious EU workloads: Mistral Large 2.
- Best for "I just want it free": Gemini 2.0 Flash free tier, or DeepSeek off-peak.
The thing this table doesn't show
Token prices aren't the whole story. The real production cost includes:
- Latency. Groq may cost more per token than Flash-Lite, but its speed may let you skip a caching layer.
- Rate limits. A cheap model with low rate limits costs you engineer time to wait out the backoff.
- Reliability. Some providers have 99.99% uptime; others have bad days.
- Feature support. Function calling, structured outputs, vision — not every provider has everything.
That's why BYOK and multi-provider clients like NovaKit matter: you can pick the best model for each specific task without negotiating procurement for each one.
Keep this up to date
Prices change. OpenAI and Google cut prices roughly every 6 months. Anthropic prices are relatively stable. New providers emerge monthly.
For live, always-current pricing see our price tracker. For estimating your monthly spend under different usage patterns, use the cost calculator.
Stop guessing at AI costs. NovaKit tracks every message's token count and dollar cost in real time — for every provider, no matter which model you pick.