The Complete AI API Pricing Guide 2026: All 13 Major Providers Compared

TL;DR

The gap between the cheapest and most expensive flagship model is now 300x — Gemini 2.0 Flash at $0.10/M tokens vs. Claude Opus 4 at $75/M output.
For 80% of production workloads, Claude Sonnet 4.6 or GPT-4o-mini hit the quality/price sweet spot.
Open-source models on Groq and Together cost pennies for speeds that were hyperscaler-only 18 months ago.
This guide lists every major provider with current prices (February 2026), their quirks, and what each is best for.

Bookmark this post — we keep it updated. The live, always-current pricing is at /price-tracker.

How to read this guide

Every provider has its own pricing model, but the dominant pattern is $ per 1 million tokens, split into input (what you send to the model) and output (what the model generates). Output tokens are usually 3-5x more expensive than input tokens.

A token ≈ 4 characters of English ≈ 0.75 words. A typical chat message is 150-300 tokens. A long document might be 10,000+ tokens.

All prices below are as of February 2026 and quoted in USD per 1 million tokens.

The big three (frontier labs)

OpenAI

Model	Input	Output	Context	Best for
GPT-5	$5.00	$15.00	256k	Hardest reasoning, long planning
GPT-4o	$2.50	$10.00	128k	General-purpose workhorse
GPT-4o-mini	$0.15	$0.60	128k	Bulk tasks, fast + cheap
o3	$15.00	$60.00	200k	Expensive deep reasoning
o3-mini	$1.10	$4.40	200k	Good reasoning at a reasonable price
GPT-Image-1	Per image	—	—	Image generation ($0.04/std, $0.17/HD)

Notes: OpenAI offers prompt caching (50% discount on repeated input tokens) and batch API (50% off, async). Free tier is limited; you need to add at least $5 credit to unlock higher tier rate limits.

Best for: Anything where quality and ecosystem support matter. OpenAI has the most mature tooling — function calling, structured outputs, assistants API.

Anthropic

Model	Input	Output	Context	Best for
Claude Opus 4	$15.00	$75.00	200k (1M w/ beta)	Best coding model, hardest tasks
Claude Sonnet 4.6	$3.00	$15.00	200k	Sweet spot: Opus quality at 5x less cost
Claude Haiku 3.5	$0.80	$4.00	200k	Fast, cheap, still surprisingly good

Notes: Anthropic offers prompt caching (up to 90% discount after the first cached request). The 1M context variant of Opus/Sonnet costs 2x the base rates. No image generation — Claude is text-and-vision-input only.

Best for: Coding, long-document reasoning, agentic workflows, instruction-following.

Google (Gemini)

Model	Input	Output	Context	Best for
Gemini 2.5 Pro	$1.25	$10.00	2M	Massive context, research, long video
Gemini 2.0 Flash	$0.10	$0.40	1M	Cheap workhorse for huge context
Gemini 2.0 Flash-Lite	$0.05	$0.20	1M	Dirt-cheap batch workloads

Notes: Gemini's context window is its superpower — 2M tokens on 2.5 Pro is unmatched. Free tier on AI Studio is generous but uses your data for training. Paid/Vertex does not.

Best for: Massive documents, long codebases, video understanding, multimodal tasks.

The fast inference providers (open models, crazy speeds)

Groq

Model	Input	Output	Speed	Context
Llama 3.3 70B	$0.59	$0.79	~300 tok/s	128k
Llama 3.1 8B	$0.05	$0.08	~750 tok/s	128k
DeepSeek R1 Distill	$0.75	$0.99	~200 tok/s	128k
Mixtral 8x7B	$0.24	$0.24	~500 tok/s	32k

Notes: Groq runs open-source models on custom LPU hardware. Output tokens per second are typically 5-10x faster than GPU-based hosts. Rate limits are per-model and can be hit during peak hours.

Best for: Real-time chat where perceived speed matters, voice applications, streaming UX.

Together AI

Model	Input	Output	Context
Llama 3.3 70B Turbo	$0.88	$0.88	128k
Llama 3.1 405B Turbo	$3.50	$3.50	128k
DeepSeek V3	$1.25	$1.25	64k
Qwen 2.5 72B	$1.20	$1.20	32k

Notes: Together offers the widest selection of open-source models, including specialized coding and math variants. Dedicated endpoints available for enterprise.

Best for: Open-source model experimentation, specialized models (coder variants, math variants), self-deployed workflows.

Fireworks AI

Model	Input	Output	Notes
Llama 3.3 70B	$0.90	$0.90	Fast inference
Mixtral 8x22B	$1.20	$1.20	High-quality MoE
Fireworks-tuned custom	Per config	Per config	Fine-tuning platform

Best for: Fine-tuning your own model variants, production OSS serving with SLAs.

The value providers (cheap but capable)

DeepSeek

Model	Input	Output	Context
DeepSeek V3	$0.27 (off-peak $0.07)	$1.10 (off-peak $0.27)	64k
DeepSeek R1	$0.55 (off-peak $0.14)	$2.19 (off-peak $0.55)	64k

Notes: Off-peak discount windows (roughly 16:30-00:30 UTC) cut prices 75%. Quality genuinely competes with GPT-4o on most benchmarks at a fraction of the price. Data sovereignty concerns for some enterprise users (servers in China).

Best for: Cost-sensitive production workloads, research, non-sensitive consumer products.

Mistral

Model	Input	Output	Context
Mistral Large 2	$2.00	$6.00	128k
Mistral Medium	$0.40	$2.00	32k
Mistral Small	$0.20	$0.60	32k
Codestral	$0.30	$0.90	32k

Notes: European-based, strong GDPR posture. Codestral is specifically tuned for code. Does not train on API data.

Best for: EU companies with data residency requirements, code-focused workflows, no-China / no-US alternatives.

The specialized players

xAI (Grok)

Model	Input	Output	Context
Grok 3	$5.00	$15.00	128k
Grok 3 Mini	$0.30	$0.50	128k

Best for: Real-time Twitter/X data access (the one thing it has that others don't).

Cohere

Model	Input	Output	Context
Command R+	$2.50	$10.00	128k
Command R	$0.15	$0.60	128k
Embed v3	$0.10	—	Embeddings

Best for: Retrieval-heavy workloads, RAG pipelines, enterprise search.

Perplexity

Model	Input	Output	Notes
Sonar Pro	$3.00	$15.00	Built-in web search
Sonar	$1.00	$1.00	Faster web-grounded

Best for: Applications that need web search grounding baked in. Non-standard pricing includes per-request search fee.

OpenRouter

OpenRouter is not a model provider — it's a routing layer that gives you one API to access most of the models listed above. Prices are the provider's price plus a small margin (typically 0-5%). Great if you don't want to manage multiple API keys, though BYOK with direct provider keys is usually cheaper.

Real-world cost for common tasks

Here's what common tasks actually cost across providers. All assume one full round-trip (prompt + response).

Task: Short chat message (~200 in, ~400 out)

Model	Cost	Feels like
GPT-4o-mini	$0.0003	Essentially free
Claude Haiku 3.5	$0.0018	Essentially free
Claude Sonnet 4.6	$0.0066	¢ per dozen messages
GPT-4o	$0.0045	¢ per dozen messages
Claude Opus 4	$0.0330	~3¢ per message
GPT-5	$0.0070	< 1¢ per message

Task: Long document summary (~50k in, ~2k out)

Model	Cost
Gemini 2.0 Flash	$0.006
GPT-4o-mini	$0.009
Claude Sonnet 4.6	$0.180
GPT-4o	$0.145
Claude Opus 4	$0.900

Task: Bulk classification (1,000 items × 500 in × 50 out)

Model	Cost
Gemini 2.0 Flash-Lite	$0.030
GPT-4o-mini	$0.105
Claude Haiku 3.5	$0.600

Free tiers worth knowing about

Google AI Studio: Generous free tier. Catch: trains on your data. Use for experimentation, not production user data.
Groq: Free tier with rate limits. Great for learning, prototyping.
Mistral: "La Plateforme" has a free tier for experimentation.
Together: $5 free credit for new accounts, plus a free tier on smaller models.
OpenRouter: Many free-tier endpoints for open-source models.

What you should actually use

If we had to pick one default for each category:

Best all-rounder: Claude Sonnet 4.6. Good at everything, costs what one decent coffee costs per week for heavy use.
Best cheap default: GPT-4o-mini. Near-free, great quality.
Best quality-regardless-of-price: Claude Opus 4 for code, GPT-5 for reasoning.
Best for huge documents: Gemini 2.5 Pro (2M context).
Best for speed: Groq + Llama 3.3 70B (~300 tokens/sec).
Best for privacy-conscious EU workloads: Mistral Large 2.
Best for "I just want it free": Gemini 2.0 Flash free tier, or DeepSeek off-peak.

The thing this table doesn't show

Token prices aren't the whole story. The real production cost includes:

Latency. Groq may cost more per token than Flash-Lite, but its speed may let you skip a caching layer.
Rate limits. A cheap model with low rate limits costs you engineer time to wait out the backoff.
Reliability. Some providers have 99.99% uptime; others have bad days.
Feature support. Function calling, structured outputs, vision — not every provider has everything.

That's why BYOK and multi-provider clients like NovaKit matter: you can pick the best model for each specific task without negotiating procurement for each one.

Keep this up to date

Prices change. OpenAI and Google cut prices roughly every 6 months. Anthropic prices are relatively stable. New providers emerge monthly.

For live, always-current pricing see our price tracker. For estimating your monthly spend under different usage patterns, use the cost calculator.

Stop guessing at AI costs. NovaKit tracks every message's token count and dollar cost in real time — for every provider, no matter which model you pick.

The Complete AI API Pricing Guide 2026: All 13 Major Providers Compared

TL;DR

How to read this guide

The big three (frontier labs)

OpenAI

Anthropic

Google (Gemini)

The fast inference providers (open models, crazy speeds)

Groq

Together AI

Fireworks AI

The value providers (cheap but capable)

DeepSeek

Mistral

The specialized players

xAI (Grok)

Cohere

Perplexity

OpenRouter

Real-world cost for common tasks

Task: Short chat message (~200 in, ~400 out)

Task: Long document summary (~50k in, ~2k out)

Task: Bulk classification (1,000 items × 500 in × 50 out)

Free tiers worth knowing about

What you should actually use

The thing this table doesn't show

Keep this up to date

Stop reading about AI tools. Use the one you own.

AI Cost Tracking in 2026: Why Per-Token Billing Is the New Cloud Bill

ChatGPT Plus vs API in 2026: Which Is Actually Cheaper? (Real Numbers)

TL;DR

How to read this guide

The big three (frontier labs)

OpenAI

Anthropic

Google (Gemini)

The fast inference providers (open models, crazy speeds)

Groq

Together AI

Fireworks AI

The value providers (cheap but capable)

DeepSeek

Mistral

The specialized players

xAI (Grok)

Cohere

Perplexity

OpenRouter

Real-world cost for common tasks

Task: Short chat message (~200 in, ~400 out)

Task: Long document summary (~50k in, ~2k out)

Task: Bulk classification (1,000 items × 500 in × 50 out)

Free tiers worth knowing about

What you should actually use

The thing this table doesn't show

Keep this up to date

Stop reading about AI tools. Use the one you own.

Related reading

AI Cost Tracking in 2026: Why Per-Token Billing Is the New Cloud Bill

ChatGPT Plus vs API in 2026: Which Is Actually Cheaper? (Real Numbers)