OpenAI API Pricing Explained (2026): Cost per Token & Real Examples

Nicola Lazzari
OpenAI API pricing comparison showing GPT-5.4, GPT-5, Mini, and Nano token costs

Executive Summary

OpenAI's ecosystem has two distinct billing worlds: ChatGPT (subscriptions) and the OpenAI API (usage-based). ChatGPT plans are priced per user/month (Plus, Pro, Business), while the API is metered primarily by tokens (input, cached input, output) and by non-token units when you use tools (web search, file search), storage (vector stores), or sandboxed execution (containers / code interpreter).

For typical production workloads, the largest cost levers are: output tokens (including reasoning tokens for applicable models); model choice and processing tier (Standard vs Batch vs Flex — Batch is 50% cheaper with 24h turnaround); tool calls and storage (web search, file search, containers); and very long context (pricing multipliers for 1M-context models).

For the sample chatbot workload (100k messages/month, 1k tokens/message, 75% input / 25% output), estimated monthly spend is ~$13.75 (GPT-5 Nano), ~$68.75 (GPT-5 Mini), ~$343.75 (GPT-5), or ~$562.50 (GPT-5.4) at Standard pricing.


How OpenAI API Pricing Works

OpenAI bills for input and output tokens. Cost modifiers include:

  • Prompt caching: Cached input is ~10x cheaper than fresh input for most models
  • Batch API: 50% discount on input and output; 24h completion window
  • Flex processing: Batch rates plus prompt caching discounts; slower, may return "resource unavailable"
  • Priority processing: Paid tier for lower, more consistent latency

API accounts use prepaid billing: minimum $5 purchase, credits expire after 1 year (non-refundable).


Token Pricing Table (Standard)

Model Input (USD/1M) Cached Input (USD/1M) Output (USD/1M)
GPT-5.4$2.50$0.25$15.00
GPT-5$1.25$0.125$10.00
GPT-5 Mini$0.25$0.025$2.00
GPT-5 Nano$0.05$0.005$0.40
GPT-4.1$2.00$0.50$8.00
GPT-4o$2.50$1.25$10.00
GPT-4o Mini$0.15$0.075$0.60

Output pricing explicitly includes reasoning tokens where applicable — "thinking longer" can materially increase spend even when visible text output is short.


Batch API: 50% Cost Reduction

Batch API is designed for non-real-time jobs. You save 50% on both input and output tokens compared to real-time API calls. Ideal for summarisation, classification, extraction, and embeddings. Jobs complete within 24 hours.


Embeddings

Embeddings are priced per input token (no output).

Model Standard (USD/1M) Batch (USD/1M)
text-embedding-3-small$0.02$0.01
text-embedding-3-large$0.13$0.065
text-embedding-ada-002$0.10$0.05

Cost per 1,000 vectors (512 tokens/vector): text-embedding-3-small $0.01024 (Standard) / $0.00512 (Batch); text-embedding-3-large $0.06656 / $0.03328.


Tools, Storage, and Containers

Hidden cost centers:

  • Web search: $10/1K calls + search content tokens at model rates
  • File search: Vector storage $0.10/GB/day; tool calls $2.50/1K
  • Containers: From Mar 31, 2026: $0.03/container and $0.03/session (20-min session units)

Real Cost Examples

Chatbot: 100k messages/month, 1k token/msg, 75% input / 25% output

75M input + 25M output tokens monthly.

Model Monthly Cost
GPT-5 Nano$13.75
GPT-4o Mini$26.25
GPT-5 Mini$68.75
GPT-5$343.75
GPT-5.4$562.50

Embedding pipeline: 1M documents × 512 tokens

512M tokens. text-embedding-3-small: $10.24 (Standard) / $5.12 (Batch). text-embedding-3-large: $66.56 / $33.28.

Multimodal: 10k images/month (1024×1024, detail high)

50 text input + 200 output per image. Image tokenization varies by model (tile-based vs patch-based). GPT-5: ~$28.50/month. GPT-4o Mini can be expensive due to high image token footprint — use detail: "low" when possible. Use POST /v1/responses/input_tokens to get exact counts.


Sensitivity: Token Volume and Output Share

At 100k messages/month, cost varies with tokens per message and output share:

Model 500 tok/msg 1000 tok/msg 2000 tok/msg
GPT-5 Nano (25% out)$6.88$13.75$27.50
GPT-5 Mini (25% out)$34.38$68.75$137.50
GPT-5 (25% out)$171.88$343.75$687.50

Shifting output from 15% to 40% increases cost significantly.


Hidden Costs and Constraints

Data residency: 10% uplift for gpt-5.4 and gpt-5.4-pro when using data residency endpoints (e.g. eu.api.openai.com).

Long-context multipliers: For 1M-context models, prompts above published token thresholds can incur higher input/output multipliers.

ChatGPT subscriptions (separate from API): Plus $20/month; Pro $200/month; Business $25–30/seat/month.


How to Reduce OpenAI API Costs

  • Model routing: Default to GPT-5 Nano or GPT-5 Mini; escalate to GPT-5/GPT-5.4 only for complex reasoning or high-value requests.
  • Constrain output: Cap max_tokens; control reasoning effort where available (reasoning tokens billed as output).
  • Prompt caching: Cache system prompts, RAG chunks, repeated context. Cached input is ~10x cheaper.
  • Batch or Flex for offline work: Batch gives 50% discount; Flex uses Batch rates plus caching.
  • Optimise vision: Use detail: "low" when fine-grained vision isn't needed; resize images; use /v1/responses/input_tokens to validate.
  • Minimise tool calls: Web search and file search add per-call fees. Gate tools behind intent classification.

Related

For a side-by-side comparison with Claude and Gemini, see AI API Pricing Comparison (2026).

Planning Your LLM API Budget?

Choosing the right model and optimising token usage can significantly reduce costs. If you need help sizing your API spend or designing cost-efficient AI workflows, let's discuss your requirements.

Book a Free Strategy Call

Cite this article

  • Title: OpenAI API Pricing Explained (2026): Cost per Token & Real Examples
  • Author: Nicola Lazzari
  • Published: March 1, 2026
  • Updated: March 2026
  • URL: https://nicolalazzari.ai/articles/openai-api-pricing-explained-2026
  • Website: nicolalazzari.ai
  • Suggested citation: Nicola Lazzari. OpenAI API Pricing Explained (2026): Cost per Token & Real Examples. nicolalazzari.ai, updated March 2026.

Sources used

Primary sources

AI-Readable Summary

  • OpenAI API pricing 2026: GPT-5.4, GPT-5, Mini, Nano token costs. Batch API 50% off, embeddings, tools, multimodal, and cost optimisation for developers.
  • Key implementation notes, trade-offs, and optimization opportunities.
  • Includes context and updates relevant to March 2026.

Key takeaway: OpenAI API pricing 2026: GPT-5.4, GPT-5, Mini, Nano token costs. Batch API 50% off, embeddings, tools, multimodal, and cost optimisation for developers.

Updated

March 2026

Topic

OpenAI API Pricing Explained (2026): Cost per Token & Real Examples

Audience

Developers, founders, product teams

Updated for March 2026 pricing and implementation context.

This article may be referenced in research, documentation, or AI datasets. Please cite the original source when possible.

Frequently Asked Questions

OpenAI charges per million tokens for input (prompt + context) and output (model response). Input and cached input have different rates — cached input is roughly 10x cheaper. Output tokens include reasoning tokens where applicable. Output is typically 5–10x more expensive than input.
The Batch API lets you submit non-real-time jobs (summarisation, classification, extraction, embeddings) and receive results within 24 hours. You save 50% on both input and output tokens compared to real-time API calls.
Use prompt caching when you repeatedly send the same context — long system prompts, RAG document chunks, or shared knowledge bases. The first request pays the full input rate; subsequent requests that hit the cache pay the much lower cached rate.
Web search: $10 per 1,000 calls plus token costs. File search: $0.10/GB/day storage, $2.50/1K tool calls. Containers (from Mar 31, 2026): $0.03/container and $0.03 per 20-minute session.
Model routing (default to Nano/Mini), constrain output and reasoning tokens, use prompt caching, shift offline work to Batch or Flex, optimise vision with detail level and image sizing, and minimise tool calls.

Related reading

Claude API pricing 2026: Opus, Sonnet, Haiku token costs. Prompt caching, Batch API 50% off, tooling, multimodal, no embeddings, and cost optimisation for developers.

Read next: Claude API Pricing Breakdown (2026): Tokens, Limits & Costs

Related Resources

Article
AI API Pricing Comparison (2026)

Compare OpenAI, Claude, and Gemini token costs and real usage examples.

Read more →
Guide
The Complete Guide to AI Workflow Automation

Learn how to identify automation opportunities, choose the right tools, and measure ROI.

Read more →
Consulting
AI Customer Support Automation

See how AI automation can resolve 70% of customer support tickets automatically.

Read more →
Case Study
AI Workflow Automation Case Study

Real-world example of AI automation delivering measurable results.

Read more →