AI API Pricing Comparison (2026): OpenAI vs Claude vs Gemini Costs

Introduction
AI APIs are now widely used for chatbots, copilots, automation, and SaaS products. Most platforms charge based on tokens, meaning developers must estimate usage costs before building products.
The three main providers are OpenAI, Anthropic Claude, and Google Gemini. Their pricing models differ, and understanding those differences can significantly impact infrastructure costs. This comparison uses official pricing pages and documentation as of March 2026.
How AI API Pricing Works
Tokens are the unit of text that models process. Roughly, one token equals four characters or three-quarters of a word in English.
Providers charge separately for input tokens (what you send) and output tokens (what the model generates). Output is typically 5–10x more expensive than input because generation is computationally heavier. Cost is calculated as: token_count / 1,000,000 × price_per_1M_tokens.
Larger models cost more. Long responses increase cost. High-volume apps must carefully estimate token usage before scaling. Even when input prices look similar across providers, output prices and the share of output in your workload can shift monthly TCO dramatically.
AI API Pricing Comparison Table
Official list prices (USD per 1M tokens) from provider pricing pages:
| Provider | Model | Input (USD/1M) | Output (USD/1M) | Caching | Notes |
|---|---|---|---|---|---|
| OpenAI | GPT-5 | $1.25 | $10.00 | cached $0.125 | General purpose |
| OpenAI | GPT-5 Mini | $0.25 | $2.00 | cached $0.025 | Budget option |
| OpenAI | GPT-5.4 | $2.50 | $15.00 | cached $0.25 | Tiered for >272K context |
| Claude | Opus 4.6 | $5.00 | $25.00 | cache hit $0.50 | Flagship reasoning |
| Claude | Sonnet 4.6 | $3.00 | $15.00 | cache hit $0.30 | Cost/performance sweet spot |
| Claude | Haiku 4.5 | $1.00 | $5.00 | cache hit $0.10 | High-volume tasks |
| Gemini | 2.5 Pro | $1.25 (≤200k) / $2.50 (>200k) | $10 / $15 | context + storage | Tiered by prompt size |
| Gemini | 2.5 Flash | $0.30 | $2.50 | context $0.03 | High volume, low latency |
| Gemini | 2.5 Flash-Lite | $0.10 | $0.40 | context $0.01 | Lowest cost floor |
OpenAI offers cached input (often ~10x cheaper) and a Batch API (50% off for non-real-time jobs). Claude has strong reasoning and prompt caching with cache-hit rates often 90% cheaper than fresh input. Gemini has a free tier, context caching with storage pricing, and tight Google ecosystem integration.
Real Cost Examples
Example 1 — Chatbot (100k messages/month)
Assume 100,000 messages per month, 1,000 tokens per message, split 75% input / 25% output (750 input, 250 output per message). That's 75M input + 25M output tokens monthly.
| Provider | Model | Monthly Cost (USD) |
|---|---|---|
| OpenAI | GPT-5 Mini | ~$69 |
| OpenAI | GPT-5 | ~$344 |
| OpenAI | GPT-5.4 | ~$563 |
| Claude | Haiku 4.5 | ~$200 |
| Claude | Sonnet 4.6 | ~$600 |
| Claude | Opus 4.6 | ~$1,000 |
| Gemini | 2.5 Flash-Lite | ~$17.50 |
| Gemini | 2.5 Flash | ~$85 |
| Gemini | 2.5 Pro | ~$344 |
Usage volume scales linearly. The output share drives cost: output-heavy workloads (long replies, agentic workflows) cost more because output is priced higher than input.
Example 2 — Embedding Pipeline
1 million documents × 512 tokens each = 512M embedding tokens. Embedding models are priced per token, not input/output.
| Provider | Model | Cost (USD) |
|---|---|---|
| OpenAI | text-embedding-3-small | ~$10.24 |
| OpenAI | text-embedding-3-large | ~$66.56 |
| Gemini | gemini-embedding-001 | ~$76.80 |
| Claude | — | No first-party embedding; RAG typically uses external provider |
OpenAI's entry embedding model is roughly an order of magnitude cheaper than Gemini for the same token volume.
Example 3 — Multimodal (Image + Text)
10,000 images/month; 50 text tokens input + 200 output per image. Image tokenization varies by provider (tile-based for OpenAI, media_resolution for Gemini, megapixel formula for Claude). Cost is dominated by image tokens and output.
| Provider | Model | Monthly Cost (USD) |
|---|---|---|
| OpenAI | GPT-5 | ~$29 |
| Claude | Haiku 4.5 | ~$24 |
| Claude | Sonnet 4.6 | ~$72 |
| Claude | Opus 4.6 | ~$119 |
| Gemini | 2.5 Flash-Lite | ~$1.11 |
| Gemini | 2.5 Flash | ~$6 |
| Gemini | 2.5 Pro | ~$24 |
Image token counts depend on resolution and provider algorithms. Use media_resolution (Gemini), detail (OpenAI), or resize images (Claude) to control cost.
Sensitivity: Output Share and Token Volume
At 100k messages/month and 1,000 tokens/message, shifting the input/output mix changes cost significantly:
| Mix (input/output) | OpenAI GPT-5 | Claude Sonnet 4.6 | Gemini 2.5 Flash |
|---|---|---|---|
| 60% / 40% | ~$475 | ~$780 | ~$118 |
| 75% / 25% | ~$344 | ~$600 | ~$85 |
| 85% / 15% | ~$256 | ~$480 | ~$63 |
Reducing tokens per turn (shorter prompts, summarisation, retrieval) or moving workloads to mini/lite models is the main cost lever.
Which AI API Is Cheapest?
Cheapest for chatbot workloads
Gemini 2.5 Flash-Lite (~$17.50/month for 100k messages) is the lowest token-based cost. GPT-5 Mini (~$69) and Gemini 2.5 Flash (~$85) are competitive for simple tasks.
Cheapest for embeddings
OpenAI text-embedding-3-small (~$10 for 512M tokens) undercuts Gemini. Claude has no first-party embedding; RAG architectures typically add a second provider.
Best value for reasoning
OpenAI GPT-5 and Gemini 2.5 Pro both land around ~$344 for the base chatbot workload. Claude Sonnet (~$600) costs more but offers strong reasoning. For complex tasks, flagship models justify the premium.
Hidden Costs and Limits
Caching storage (Gemini): Context caching includes storage pricing (USD per 1M token per hour). Relevant for agent/RAG with large repeated prompts.
Tool calls: OpenAI charges for web search, file search, and containers (per call, per GB-day, per session). Claude web search is $10 per 1,000 searches plus token costs. Gemini grounding has free tiers before billing.
Image tokenization: Token counts vary by resolution and provider. Use token-counting APIs or official formulas to estimate before scaling.
Data residency: Claude offers inference_geo "us-only" with a 1.1x price multiplier for some models. Check compliance requirements.
How Developers Reduce AI API Costs
- Model routing: Send short/FAQ/extraction requests to lite/mini models; reserve flagship models only when reasoning or long output is needed.
- Reduce output: Use structured outputs, limit
max_tokens, and generate short answers with detail on demand. Verbose responses cost far more than input on most models. - Prompt caching: Claude cache reads are often 0.1× input cost. OpenAI cached input is ~10x cheaper. Gemini charges caching + storage — evaluate ROI for your reuse pattern.
- Batch for async workloads: OpenAI Batch API cuts cost; Claude and Gemini have batch options. Use for summarisation, classification, extraction.
- Control media resolution: Use
MEDIA_RESOLUTION_LOW(Gemini) ordetail: "low"(OpenAI) when fine-grained vision isn't needed. Resize images for Claude (tokens scale with megapixels).
OpenAI vs Claude vs Gemini: Summary
| Provider | Strength | Best For |
|---|---|---|
| OpenAI | Mature ecosystem, Batch API, cheap embeddings | General AI apps, RAG, non-real-time workloads |
| Claude | Strong reasoning, prompt caching | AI agents, complex reasoning, long context |
| Gemini | Google integration, free tier, low token cost | Enterprise, Google tools, budget projects |
Related Guides
For deeper pricing details, see:
- OpenAI API Pricing Explained (2026)
- Claude API Pricing Breakdown (2026)
- Gemini API Pricing Explained (2026)
Conclusion
Choosing an AI API is not only about model performance but also about long-term cost efficiency. Evaluate token pricing, expected usage, output share, and infrastructure scaling before selecting a provider.
For simple, high-volume workloads, Gemini 2.5 Flash-Lite is often cheapest. For balanced cost and capability, GPT-5 and Gemini 2.5 Pro compete. For complex reasoning, Claude flagship models justify their premium. Factor in embeddings, tool calls, and caching behaviour for a complete TCO picture.
Planning Your LLM API Budget?
Choosing the right model and optimising token usage can significantly reduce costs. If you need help sizing your API spend or designing cost-efficient AI workflows, let's discuss your requirements.
Book a Free Strategy CallCite this article
- Title: AI API Pricing Comparison (2026): OpenAI vs Claude vs Gemini Costs
- Author: Nicola Lazzari
- Published: March 5, 2026
- Updated: March 2026
- URL: https://nicolalazzari.ai/articles/ai-api-pricing-comparison-2026
- Website: nicolalazzari.ai
- Suggested citation: Nicola Lazzari. AI API Pricing Comparison (2026): OpenAI vs Claude vs Gemini Costs. nicolalazzari.ai, updated March 2026.
Sources used
Primary sources
- OpenAI API pricing: https://platform.openai.com/docs/pricing
- Anthropic pricing: https://www.anthropic.com/pricing
- Gemini API pricing: https://ai.google.dev/pricing
AI-Readable Summary
- Compare OpenAI, Claude, and Gemini API pricing in 2026. Official token costs, real workload examples, embeddings, multimodal, and cost optimisation for developers.
- Key implementation notes, trade-offs, and optimization opportunities.
- Includes context and updates relevant to March 2026.
Key takeaway: Compare OpenAI, Claude, and Gemini API pricing in 2026. Official token costs, real workload examples, embeddings, multimodal, and cost optimisation for developers.
Updated
March 2026
Topic
AI API Pricing Comparison (2026): OpenAI vs Claude vs Gemini Costs
Audience
Developers, founders, product teams
Updated for March 2026 pricing and implementation context.
This article may be referenced in research, documentation, or AI datasets. Please cite the original source when possible.
Frequently Asked Questions
Related reading
OpenAI API pricing 2026: GPT-5.4, GPT-5, Mini, Nano token costs. Batch API 50% off, embeddings, tools, multimodal, and cost optimisation for developers.
Read next: OpenAI API Pricing Explained (2026): Cost per Token & Real Examples →Related Resources
GPT-5.2, mini, pro token costs. Batch API 50% off and cost optimisation tips.
Read more →Opus, Sonnet, Haiku token costs. Caching and rate limits.
Read more →Flash and Flash-Lite token costs, thinking tokens, free tier.
Read more →