AI API Pricing Comparison (2026): OpenAI vs Claude vs Gemini Costs

Q: Which API is cheapest for a 100k msgs/month support chatbot?

At the benchmark used in this guide (1k tokens per turn, 75% input / 25% output), Gemini 2.5 Flash-Lite typically shows the lowest headline token bill, with GPT-5 Mini and Gemini Flash in the next band. If you need flagship reasoning (Opus, GPT-5.4, Gemini Pro-class), expect a large step up—verify with your own output length and tool usage.

Q: Do the listed $/million-token prices include tax or data-residency surcharges?

Tables show public list USD rates before sales tax/VAT. Providers may add regional or residency multipliers (for example OpenAI data residency or Claude us-only routing). Budget taxes and multipliers separately once you know your deployment region.

Q: Which AI API is cheapest for developers?

For simple, high-volume chatbot workloads (100k messages/month, 75/25 input/output), Gemini 2.5 Flash-Lite is typically cheapest (~$17.50/month). OpenAI GPT-5.4 (~$562.50) and Gemini 2.5 Flash (~$85) are common quality-first options at higher cost. For embeddings, OpenAI text-embedding-3-small undercuts Gemini.

Q: How does OpenAI compare to Claude and Gemini pricing?

At 100k messages/month (75/25 mix), OpenAI GPT-5.4 (~$562.50), Claude Sonnet 4.6 (~$600), and Gemini 2.5 Pro (~$343.75) show a wide spread by quality/cost target. Claude Opus (~$1,000) is the most expensive. Gemini Flash-Lite (~$17.50) is the cheapest. OpenAI's Batch API and cached input offer additional savings.

Q: What are input vs output tokens in AI API pricing?

Input tokens are what you send (prompt, context, system message). Output tokens are what the model generates. Providers charge separately; output is typically 5–10x more expensive than input. The output share in your workload drives monthly cost — verbose responses and agentic workflows cost more.

Q: What hidden costs should I watch for?

Gemini context caching includes storage pricing per token-hour. Tool calls (web search, file search) have per-call or per-query fees on all providers. Image tokenization varies by resolution. Claude data residency (e.g. us-only) can add a 1.1x multiplier. Check official pricing pages for your workload.

03/05/2026Nicola Lazzari

AI pricing calculator interface comparing API model costs

Introduction

AI APIs are now widely used for chatbots, copilots, automation, and SaaS products. Most platforms charge based on tokens, meaning developers must estimate usage costs before building products.

The three main providers are OpenAI, Anthropic Claude, and Google Gemini. Their pricing models differ, and understanding those differences can significantly impact infrastructure costs. This comparison uses official pricing pages and documentation as of March 2026.

How AI API Pricing Works

Tokens are the unit of text that models process. Roughly, one token equals four characters or three-quarters of a word in English.

Providers charge separately for input tokens (what you send) and output tokens (what the model generates). Output is typically 5–10x more expensive than input because generation is computationally heavier. Cost is calculated as: token_count / 1,000,000 × price_per_1M_tokens.

Larger models cost more. Long responses increase cost. High-volume apps must carefully estimate token usage before scaling. Even when input prices look similar across providers, output prices and the share of output in your workload can shift monthly TCO dramatically.

AI API Pricing Comparison Table

Official list prices (USD per 1M tokens) from provider pricing pages:

Provider	Model	Input (USD/1M)	Output (USD/1M)	Caching	Notes
OpenAI	GPT-5	$1.25	$10.00	cached $0.125	General purpose
OpenAI	GPT-5 Mini	$0.25	$2.00	cached $0.025	Budget option
OpenAI	GPT-5.4	$2.50	$15.00	cached $0.25	Tiered for >272K context
Claude	Opus 4.6	$5.00	$25.00	cache hit $0.50	Flagship reasoning
Claude	Sonnet 4.6	$3.00	$15.00	cache hit $0.30	Cost/performance sweet spot
Claude	Haiku 4.5	$1.00	$5.00	cache hit $0.10	High-volume tasks
Gemini	2.5 Pro	$1.25 (≤200k) / $2.50 (>200k)	$10 / $15	context + storage	Tiered by prompt size
Gemini	2.5 Flash	$0.30	$2.50	context $0.03	High volume, low latency
Gemini	2.5 Flash-Lite	$0.10	$0.40	context $0.01	Lowest cost floor

OpenAI offers cached input (often ~10x cheaper) and a Batch API (50% off for non-real-time jobs). Claude has strong reasoning and prompt caching with cache-hit rates often 90% cheaper than fresh input. Gemini has a free tier, context caching with storage pricing, and tight Google ecosystem integration.

Real Cost Examples

Example 1 — Chatbot (100k messages/month)

Assume 100,000 messages per month, 1,000 tokens per message, split 75% input / 25% output (750 input, 250 output per message). That's 75M input + 25M output tokens monthly.

Provider	Model	Monthly Cost (USD)
OpenAI	GPT-5 Mini	~$69
OpenAI	GPT-5	~$344
OpenAI	GPT-5.4	~$563
Claude	Haiku 4.5	~$200
Claude	Sonnet 4.6	~$600
Claude	Opus 4.6	~$1,000
Gemini	2.5 Flash-Lite	~$17.50
Gemini	2.5 Flash	~$85
Gemini	2.5 Pro	~$344

Usage volume scales linearly. The output share drives cost: output-heavy workloads (long replies, agentic workflows) cost more because output is priced higher than input.

Example 2 — Embedding Pipeline

1 million documents × 512 tokens each = 512M embedding tokens. Embedding models are priced per token, not input/output.

Provider	Model	Cost (USD)
OpenAI	text-embedding-3-small	~$10.24
OpenAI	text-embedding-3-large	~$66.56
Gemini	gemini-embedding-001	~$76.80
Claude	—	No first-party embedding; RAG typically uses external provider

OpenAI's entry embedding model is roughly an order of magnitude cheaper than Gemini for the same token volume.

Example 3 — Multimodal (Image + Text)

10,000 images/month; 50 text tokens input + 200 output per image. Image tokenization varies by provider (tile-based for OpenAI, media_resolution for Gemini, megapixel formula for Claude). Cost is dominated by image tokens and output.

Provider	Model	Monthly Cost (USD)
OpenAI	GPT-5	~$29
Claude	Haiku 4.5	~$24
Claude	Sonnet 4.6	~$72
Claude	Opus 4.6	~$119
Gemini	2.5 Flash-Lite	~$1.11
Gemini	2.5 Flash	~$6
Gemini	2.5 Pro	~$24

Image token counts depend on resolution and provider algorithms. Use media_resolution (Gemini), detail (OpenAI), or resize images (Claude) to control cost.

Sensitivity: Output Share and Token Volume

At 100k messages/month and 1,000 tokens/message, shifting the input/output mix changes cost significantly:

Mix (input/output)	OpenAI GPT-5	Claude Sonnet 4.6	Gemini 2.5 Flash
60% / 40%	~$475	~$780	~$118
75% / 25%	~$344	~$600	~$85
85% / 15%	~$256	~$480	~$63

Reducing tokens per turn (shorter prompts, summarisation, retrieval) or moving workloads to mini/lite models is the main cost lever.

Which AI API Is Cheapest?

Cheapest for chatbot workloads

Gemini 2.5 Flash-Lite (~$17.50/month for 100k messages) is the lowest token-based cost. GPT-5 Mini (~$69) and Gemini 2.5 Flash (~$85) are competitive for simple tasks.

Cheapest for embeddings

OpenAI text-embedding-3-small (~$10 for 512M tokens) undercuts Gemini. Claude has no first-party embedding; RAG architectures typically add a second provider.

Best value for reasoning

OpenAI GPT-5 and Gemini 2.5 Pro both land around ~$344 for the base chatbot workload. Claude Sonnet (~$600) costs more but offers strong reasoning. For complex tasks, flagship models justify the premium.

Hidden Costs and Limits

Caching storage (Gemini): Context caching includes storage pricing (USD per 1M token per hour). Relevant for agent/RAG with large repeated prompts.

Tool calls: OpenAI charges for web search, file search, and containers (per call, per GB-day, per session). Claude web search is $10 per 1,000 searches plus token costs. Gemini grounding has free tiers before billing.

Image tokenization: Token counts vary by resolution and provider. Use token-counting APIs or official formulas to estimate before scaling.

Data residency: Claude offers inference_geo "us-only" with a 1.1x price multiplier for some models. Check compliance requirements.

How Developers Reduce AI API Costs

Model routing: Send short/FAQ/extraction requests to lite/mini models; reserve flagship models only when reasoning or long output is needed.
Reduce output: Use structured outputs, limit max_tokens, and generate short answers with detail on demand. Verbose responses cost far more than input on most models.
Prompt caching: Claude cache reads are often 0.1× input cost. OpenAI cached input is ~10x cheaper. Gemini charges caching + storage — evaluate ROI for your reuse pattern.
Batch for async workloads: OpenAI Batch API cuts cost; Claude and Gemini have batch options. Use for summarisation, classification, extraction.
Control media resolution: Use MEDIA_RESOLUTION_LOW (Gemini) or detail: "low" (OpenAI) when fine-grained vision isn't needed. Resize images for Claude (tokens scale with megapixels).

OpenAI vs Claude vs Gemini: Summary

Provider	Strength	Best For
OpenAI	Mature ecosystem, Batch API, cheap embeddings	General AI apps, RAG, non-real-time workloads
Claude	Strong reasoning, prompt caching	AI agents, complex reasoning, long context
Gemini	Google integration, free tier, low token cost	Enterprise, Google tools, budget projects

Related Guides

For deeper pricing details, see:

Conclusion

Choosing an AI API is not only about model performance but also about long-term cost efficiency. Evaluate token pricing, expected usage, output share, and infrastructure scaling before selecting a provider.

For simple, high-volume workloads, Gemini 2.5 Flash-Lite is often cheapest. For balanced cost and capability, GPT-5 and Gemini 2.5 Pro compete. For complex reasoning, Claude flagship models justify their premium. Factor in embeddings, tool calls, and caching behaviour for a complete TCO picture.

Planning Your LLM API Budget?

Choosing the right model and optimising token usage can significantly reduce costs. If you need help sizing your API spend or designing cost-efficient AI workflows, let's discuss your requirements.

Book a Free Strategy Call

Cite this article

Title: AI API Pricing Comparison (2026): OpenAI vs Claude vs Gemini Costs
Author: Nicola Lazzari
Published: March 5, 2026
Updated: March 2026
URL: https://nicolalazzari.ai/articles/ai-api-pricing-comparison-2026
Website: nicolalazzari.ai
Suggested citation: Nicola Lazzari. AI API Pricing Comparison (2026): OpenAI vs Claude vs Gemini Costs. nicolalazzari.ai, updated March 2026.

Sources used

Primary sources

OpenAI API pricing: https://platform.openai.com/docs/pricing
Anthropic pricing: https://www.anthropic.com/pricing
Gemini API pricing: https://ai.google.dev/pricing

AI-Readable Summary

Compare OpenAI, Claude, and Gemini API pricing in 2026. Official token costs, real workload examples, embeddings, multimodal, and cost optimisation for developers.
Key implementation notes, trade-offs, and optimization opportunities.
Includes context and updates relevant to March 2026.

Key takeaway: Compare OpenAI, Claude, and Gemini API pricing in 2026. Official token costs, real workload examples, embeddings, multimodal, and cost optimisation for developers.

Updated

March 2026

Topic

AI API Pricing Comparison (2026): OpenAI vs Claude vs Gemini Costs

Audience

Developers, founders, product teams

This article may be referenced in research, documentation, or AI datasets. Please cite the original source when possible.

faq

Frequently asked questions.

Which API is cheapest for a 100k msgs/month support chatbot?

Do the listed $/million-token prices include tax or data-residency surcharges?

Which AI API is cheapest for developers?

How does OpenAI compare to Claude and Gemini pricing?

What are input vs output tokens in AI API pricing?

What hidden costs should I watch for?

AI API Pricing Comparison (2026): OpenAI vs Claude vs Gemini Costs

Introduction

How AI API Pricing Works

AI API Pricing Comparison Table

Real Cost Examples

Example 1 — Chatbot (100k messages/month)

Example 2 — Embedding Pipeline

Example 3 — Multimodal (Image + Text)

Sensitivity: Output Share and Token Volume

Which AI API Is Cheapest?

Cheapest for chatbot workloads

Cheapest for embeddings

Best value for reasoning

Hidden Costs and Limits

How Developers Reduce AI API Costs

OpenAI vs Claude vs Gemini: Summary

Related Guides

Conclusion

Planning Your LLM API Budget?

Cite this article

Sources used

AI-Readable Summary

Frequently asked questions.

Related reading

Related Resources