AI API Pricing Comparison (2026): OpenAI vs Claude vs Gemini Costs

Nicola Lazzari
AI pricing calculator interface comparing API model costs

Introduction

AI APIs are now widely used for chatbots, copilots, automation, and SaaS products. Most platforms charge based on tokens, meaning developers must estimate usage costs before building products.

The three main providers are OpenAI, Anthropic Claude, and Google Gemini. Their pricing models differ, and understanding those differences can significantly impact infrastructure costs. This comparison uses official pricing pages and documentation as of March 2026.


How AI API Pricing Works

Tokens are the unit of text that models process. Roughly, one token equals four characters or three-quarters of a word in English.

Providers charge separately for input tokens (what you send) and output tokens (what the model generates). Output is typically 5–10x more expensive than input because generation is computationally heavier. Cost is calculated as: token_count / 1,000,000 × price_per_1M_tokens.

Larger models cost more. Long responses increase cost. High-volume apps must carefully estimate token usage before scaling. Even when input prices look similar across providers, output prices and the share of output in your workload can shift monthly TCO dramatically.


AI API Pricing Comparison Table

Official list prices (USD per 1M tokens) from provider pricing pages:

Provider Model Input (USD/1M) Output (USD/1M) Caching Notes
OpenAI GPT-5 $1.25 $10.00 cached $0.125 General purpose
OpenAI GPT-5 Mini $0.25 $2.00 cached $0.025 Budget option
OpenAI GPT-5.4 $2.50 $15.00 cached $0.25 Tiered for >272K context
Claude Opus 4.6 $5.00 $25.00 cache hit $0.50 Flagship reasoning
Claude Sonnet 4.6 $3.00 $15.00 cache hit $0.30 Cost/performance sweet spot
Claude Haiku 4.5 $1.00 $5.00 cache hit $0.10 High-volume tasks
Gemini 2.5 Pro $1.25 (≤200k) / $2.50 (>200k) $10 / $15 context + storage Tiered by prompt size
Gemini 2.5 Flash $0.30 $2.50 context $0.03 High volume, low latency
Gemini 2.5 Flash-Lite $0.10 $0.40 context $0.01 Lowest cost floor

OpenAI offers cached input (often ~10x cheaper) and a Batch API (50% off for non-real-time jobs). Claude has strong reasoning and prompt caching with cache-hit rates often 90% cheaper than fresh input. Gemini has a free tier, context caching with storage pricing, and tight Google ecosystem integration.


Real Cost Examples

Example 1 — Chatbot (100k messages/month)

Assume 100,000 messages per month, 1,000 tokens per message, split 75% input / 25% output (750 input, 250 output per message). That's 75M input + 25M output tokens monthly.

Provider Model Monthly Cost (USD)
OpenAI GPT-5 Mini ~$69
OpenAI GPT-5 ~$344
OpenAI GPT-5.4 ~$563
Claude Haiku 4.5 ~$200
Claude Sonnet 4.6 ~$600
Claude Opus 4.6 ~$1,000
Gemini 2.5 Flash-Lite ~$17.50
Gemini 2.5 Flash ~$85
Gemini 2.5 Pro ~$344

Usage volume scales linearly. The output share drives cost: output-heavy workloads (long replies, agentic workflows) cost more because output is priced higher than input.

Example 2 — Embedding Pipeline

1 million documents × 512 tokens each = 512M embedding tokens. Embedding models are priced per token, not input/output.

Provider Model Cost (USD)
OpenAI text-embedding-3-small ~$10.24
OpenAI text-embedding-3-large ~$66.56
Gemini gemini-embedding-001 ~$76.80
Claude No first-party embedding; RAG typically uses external provider

OpenAI's entry embedding model is roughly an order of magnitude cheaper than Gemini for the same token volume.

Example 3 — Multimodal (Image + Text)

10,000 images/month; 50 text tokens input + 200 output per image. Image tokenization varies by provider (tile-based for OpenAI, media_resolution for Gemini, megapixel formula for Claude). Cost is dominated by image tokens and output.

Provider Model Monthly Cost (USD)
OpenAI GPT-5 ~$29
Claude Haiku 4.5 ~$24
Claude Sonnet 4.6 ~$72
Claude Opus 4.6 ~$119
Gemini 2.5 Flash-Lite ~$1.11
Gemini 2.5 Flash ~$6
Gemini 2.5 Pro ~$24

Image token counts depend on resolution and provider algorithms. Use media_resolution (Gemini), detail (OpenAI), or resize images (Claude) to control cost.


Sensitivity: Output Share and Token Volume

At 100k messages/month and 1,000 tokens/message, shifting the input/output mix changes cost significantly:

Mix (input/output) OpenAI GPT-5 Claude Sonnet 4.6 Gemini 2.5 Flash
60% / 40% ~$475 ~$780 ~$118
75% / 25% ~$344 ~$600 ~$85
85% / 15% ~$256 ~$480 ~$63

Reducing tokens per turn (shorter prompts, summarisation, retrieval) or moving workloads to mini/lite models is the main cost lever.


Which AI API Is Cheapest?

Cheapest for chatbot workloads

Gemini 2.5 Flash-Lite (~$17.50/month for 100k messages) is the lowest token-based cost. GPT-5 Mini (~$69) and Gemini 2.5 Flash (~$85) are competitive for simple tasks.

Cheapest for embeddings

OpenAI text-embedding-3-small (~$10 for 512M tokens) undercuts Gemini. Claude has no first-party embedding; RAG architectures typically add a second provider.

Best value for reasoning

OpenAI GPT-5 and Gemini 2.5 Pro both land around ~$344 for the base chatbot workload. Claude Sonnet (~$600) costs more but offers strong reasoning. For complex tasks, flagship models justify the premium.


Hidden Costs and Limits

Caching storage (Gemini): Context caching includes storage pricing (USD per 1M token per hour). Relevant for agent/RAG with large repeated prompts.

Tool calls: OpenAI charges for web search, file search, and containers (per call, per GB-day, per session). Claude web search is $10 per 1,000 searches plus token costs. Gemini grounding has free tiers before billing.

Image tokenization: Token counts vary by resolution and provider. Use token-counting APIs or official formulas to estimate before scaling.

Data residency: Claude offers inference_geo "us-only" with a 1.1x price multiplier for some models. Check compliance requirements.


How Developers Reduce AI API Costs

  • Model routing: Send short/FAQ/extraction requests to lite/mini models; reserve flagship models only when reasoning or long output is needed.
  • Reduce output: Use structured outputs, limit max_tokens, and generate short answers with detail on demand. Verbose responses cost far more than input on most models.
  • Prompt caching: Claude cache reads are often 0.1× input cost. OpenAI cached input is ~10x cheaper. Gemini charges caching + storage — evaluate ROI for your reuse pattern.
  • Batch for async workloads: OpenAI Batch API cuts cost; Claude and Gemini have batch options. Use for summarisation, classification, extraction.
  • Control media resolution: Use MEDIA_RESOLUTION_LOW (Gemini) or detail: "low" (OpenAI) when fine-grained vision isn't needed. Resize images for Claude (tokens scale with megapixels).

OpenAI vs Claude vs Gemini: Summary

Provider Strength Best For
OpenAI Mature ecosystem, Batch API, cheap embeddings General AI apps, RAG, non-real-time workloads
Claude Strong reasoning, prompt caching AI agents, complex reasoning, long context
Gemini Google integration, free tier, low token cost Enterprise, Google tools, budget projects

Related Guides

For deeper pricing details, see:


Conclusion

Choosing an AI API is not only about model performance but also about long-term cost efficiency. Evaluate token pricing, expected usage, output share, and infrastructure scaling before selecting a provider.

For simple, high-volume workloads, Gemini 2.5 Flash-Lite is often cheapest. For balanced cost and capability, GPT-5 and Gemini 2.5 Pro compete. For complex reasoning, Claude flagship models justify their premium. Factor in embeddings, tool calls, and caching behaviour for a complete TCO picture.

Planning Your LLM API Budget?

Choosing the right model and optimising token usage can significantly reduce costs. If you need help sizing your API spend or designing cost-efficient AI workflows, let's discuss your requirements.

Book a Free Strategy Call

Cite this article

  • Title: AI API Pricing Comparison (2026): OpenAI vs Claude vs Gemini Costs
  • Author: Nicola Lazzari
  • Published: March 5, 2026
  • Updated: March 2026
  • URL: https://nicolalazzari.ai/articles/ai-api-pricing-comparison-2026
  • Website: nicolalazzari.ai
  • Suggested citation: Nicola Lazzari. AI API Pricing Comparison (2026): OpenAI vs Claude vs Gemini Costs. nicolalazzari.ai, updated March 2026.

Sources used

Primary sources

AI-Readable Summary

  • Compare OpenAI, Claude, and Gemini API pricing in 2026. Official token costs, real workload examples, embeddings, multimodal, and cost optimisation for developers.
  • Key implementation notes, trade-offs, and optimization opportunities.
  • Includes context and updates relevant to March 2026.

Key takeaway: Compare OpenAI, Claude, and Gemini API pricing in 2026. Official token costs, real workload examples, embeddings, multimodal, and cost optimisation for developers.

Updated

March 2026

Topic

AI API Pricing Comparison (2026): OpenAI vs Claude vs Gemini Costs

Audience

Developers, founders, product teams

Updated for March 2026 pricing and implementation context.

This article may be referenced in research, documentation, or AI datasets. Please cite the original source when possible.

Frequently Asked Questions

For simple, high-volume chatbot workloads (100k messages/month, 75/25 input/output), Gemini 2.5 Flash-Lite is typically cheapest (~$17.50/month). OpenAI GPT-5.4 (~$562.50) and Gemini 2.5 Flash (~$85) are common quality-first options at higher cost. For embeddings, OpenAI text-embedding-3-small undercuts Gemini.
At 100k messages/month (75/25 mix), OpenAI GPT-5.4 (~$562.50), Claude Sonnet 4.6 (~$600), and Gemini 2.5 Pro (~$343.75) show a wide spread by quality/cost target. Claude Opus (~$1,000) is the most expensive. Gemini Flash-Lite (~$17.50) is the cheapest. OpenAI's Batch API and cached input offer additional savings.
Input tokens are what you send (prompt, context, system message). Output tokens are what the model generates. Providers charge separately; output is typically 5–10x more expensive than input. The output share in your workload drives monthly cost — verbose responses and agentic workflows cost more.
Gemini context caching includes storage pricing per token-hour. Tool calls (web search, file search) have per-call or per-query fees on all providers. Image tokenization varies by resolution. Claude data residency (e.g. us-only) can add a 1.1x multiplier. Check official pricing pages for your workload.

Related reading

OpenAI API pricing 2026: GPT-5.4, GPT-5, Mini, Nano token costs. Batch API 50% off, embeddings, tools, multimodal, and cost optimisation for developers.

Read next: OpenAI API Pricing Explained (2026): Cost per Token & Real Examples

Related Resources

Article
OpenAI API Pricing Explained (2026)

GPT-5.2, mini, pro token costs. Batch API 50% off and cost optimisation tips.

Read more →
Article
Claude API Pricing Breakdown (2026)

Opus, Sonnet, Haiku token costs. Caching and rate limits.

Read more →
Article
Gemini API Pricing Explained (2026)

Flash and Flash-Lite token costs, thinking tokens, free tier.

Read more →