OpenAI API Pricing Explained (2026): Cost per Token & Real Examples

Q: Does ChatGPT Plus include OpenAI API credits?

No. ChatGPT consumer plans bill separately from the developer API. Production traffic should be budgeted against prepaid API credits and metered tokens. A Plus or Pro subscription does not automatically cover API usage on platform.openai.com.

Q: Why can my OpenAI invoice be higher than a simple $/million-token estimate?

Invoices add up output and reasoning tokens, tool calls (web search per 1k calls, file search, vector storage), container sessions, and tier choice (Standard vs Batch vs Flex). Data-residency endpoints can also apply uplifts. Always reconcile line items in the usage dashboard with the official pricing page.

Q: How does OpenAI token billing work?

OpenAI charges per million tokens for input (prompt + context) and output (model response). Input and cached input have different rates — cached input is roughly 10x cheaper. Output tokens include reasoning tokens where applicable. Output is typically 5–10x more expensive than input.

Q: What is the OpenAI Batch API?

The Batch API lets you submit non-real-time jobs (summarisation, classification, extraction, embeddings) and receive results within 24 hours. You save 50% on both input and output tokens compared to real-time API calls.

Q: When should I use cached input?

Use prompt caching when you repeatedly send the same context — long system prompts, RAG document chunks, or shared knowledge bases. The first request pays the full input rate; subsequent requests that hit the cache pay the much lower cached rate.

Q: What are OpenAI tool and container costs?

Web search: $10 per 1,000 calls plus token costs. File search: $0.10/GB/day storage, $2.50/1K tool calls. Containers (from Mar 31, 2026): $0.03/container and $0.03 per 20-minute session.

Q: How can I reduce OpenAI API costs?

Model routing (default to Nano/Mini), constrain output and reasoning tokens, use prompt caching, shift offline work to Batch or Flex, optimise vision with detail level and image sizing, and minimise tool calls.

03/01/2026Nicola Lazzari

OpenAI API pricing comparison showing GPT-5.4, GPT-5, Mini, and Nano token costs

Executive Summary

OpenAI's ecosystem has two distinct billing worlds: ChatGPT (subscriptions) and the OpenAI API (usage-based). ChatGPT plans are priced per user/month (Plus, Pro, Business), while the API is metered primarily by tokens (input, cached input, output) and by non-token units when you use tools (web search, file search), storage (vector stores), or sandboxed execution (containers / code interpreter).

For typical production workloads, the largest cost levers are: output tokens (including reasoning tokens for applicable models); model choice and processing tier (Standard vs Batch vs Flex — Batch is 50% cheaper with 24h turnaround); tool calls and storage (web search, file search, containers); and very long context (pricing multipliers for 1M-context models).

For the sample chatbot workload (100k messages/month, 1k tokens/message, 75% input / 25% output), estimated monthly spend is ~$13.75 (GPT-5 Nano), ~$68.75 (GPT-5 Mini), ~$343.75 (GPT-5), or ~$562.50 (GPT-5.4) at Standard pricing.

How OpenAI API Pricing Works

OpenAI bills for input and output tokens. Cost modifiers include:

Prompt caching: Cached input is ~10x cheaper than fresh input for most models
Batch API: 50% discount on input and output; 24h completion window
Flex processing: Batch rates plus prompt caching discounts; slower, may return "resource unavailable"
Priority processing: Paid tier for lower, more consistent latency

API accounts use prepaid billing: minimum $5 purchase, credits expire after 1 year (non-refundable).

Token Pricing Table (Standard)

Model	Input (USD/1M)	Cached Input (USD/1M)	Output (USD/1M)
GPT-5.4	$2.50	$0.25	$15.00
GPT-5	$1.25	$0.125	$10.00
GPT-5 Mini	$0.25	$0.025	$2.00
GPT-5 Nano	$0.05	$0.005	$0.40
GPT-4.1	$2.00	$0.50	$8.00
GPT-4o	$2.50	$1.25	$10.00
GPT-4o Mini	$0.15	$0.075	$0.60

Output pricing explicitly includes reasoning tokens where applicable — "thinking longer" can materially increase spend even when visible text output is short.

Batch API: 50% Cost Reduction

Batch API is designed for non-real-time jobs. You save 50% on both input and output tokens compared to real-time API calls. Ideal for summarisation, classification, extraction, and embeddings. Jobs complete within 24 hours.

Embeddings

Embeddings are priced per input token (no output).

Model	Standard (USD/1M)	Batch (USD/1M)
text-embedding-3-small	$0.02	$0.01
text-embedding-3-large	$0.13	$0.065
text-embedding-ada-002	$0.10	$0.05

Cost per 1,000 vectors (512 tokens/vector): text-embedding-3-small $0.01024 (Standard) / $0.00512 (Batch); text-embedding-3-large $0.06656 / $0.03328.

Tools, Storage, and Containers

Hidden cost centers:

Web search: $10/1K calls + search content tokens at model rates
File search: Vector storage $0.10/GB/day; tool calls $2.50/1K
Containers: From Mar 31, 2026: $0.03/container and $0.03/session (20-min session units)

Real Cost Examples

Chatbot: 100k messages/month, 1k token/msg, 75% input / 25% output

75M input + 25M output tokens monthly.

Model	Monthly Cost
GPT-5 Nano	$13.75
GPT-4o Mini	$26.25
GPT-5 Mini	$68.75
GPT-5	$343.75
GPT-5.4	$562.50

Embedding pipeline: 1M documents × 512 tokens

512M tokens. text-embedding-3-small: $10.24 (Standard) / $5.12 (Batch). text-embedding-3-large: $66.56 / $33.28.

Multimodal: 10k images/month (1024×1024, detail high)

50 text input + 200 output per image. Image tokenization varies by model (tile-based vs patch-based). GPT-5: ~$28.50/month. GPT-4o Mini can be expensive due to high image token footprint — use detail: "low" when possible. Use POST /v1/responses/input_tokens to get exact counts.

Sensitivity: Token Volume and Output Share

At 100k messages/month, cost varies with tokens per message and output share:

Model	500 tok/msg	1000 tok/msg	2000 tok/msg
GPT-5 Nano (25% out)	$6.88	$13.75	$27.50
GPT-5 Mini (25% out)	$34.38	$68.75	$137.50
GPT-5 (25% out)	$171.88	$343.75	$687.50

Shifting output from 15% to 40% increases cost significantly.

Hidden Costs and Constraints

Data residency: 10% uplift for gpt-5.4 and gpt-5.4-pro when using data residency endpoints (e.g. eu.api.openai.com).

Long-context multipliers: For 1M-context models, prompts above published token thresholds can incur higher input/output multipliers.

ChatGPT subscriptions (separate from API): Plus $20/month; Pro $200/month; Business $25–30/seat/month.

How to Reduce OpenAI API Costs

Model routing: Default to GPT-5 Nano or GPT-5 Mini; escalate to GPT-5/GPT-5.4 only for complex reasoning or high-value requests.
Constrain output: Cap max_tokens; control reasoning effort where available (reasoning tokens billed as output).
Prompt caching: Cache system prompts, RAG chunks, repeated context. Cached input is ~10x cheaper.
Batch or Flex for offline work: Batch gives 50% discount; Flex uses Batch rates plus caching.
Optimise vision: Use detail: "low" when fine-grained vision isn't needed; resize images; use /v1/responses/input_tokens to validate.
Minimise tool calls: Web search and file search add per-call fees. Gate tools behind intent classification.

For a side-by-side comparison with Claude and Gemini, see AI API Pricing Comparison (2026).

Planning Your LLM API Budget?

Choosing the right model and optimising token usage can significantly reduce costs. If you need help sizing your API spend or designing cost-efficient AI workflows, let's discuss your requirements.

Book a Free Strategy Call

Cite this article

Title: OpenAI API Pricing Explained (2026): Cost per Token & Real Examples
Author: Nicola Lazzari
Published: March 1, 2026
Updated: March 2026
URL: https://nicolalazzari.ai/articles/openai-api-pricing-explained-2026
Website: nicolalazzari.ai
Suggested citation: Nicola Lazzari. OpenAI API Pricing Explained (2026): Cost per Token & Real Examples. nicolalazzari.ai, updated March 2026.

Sources used

Primary sources

OpenAI API pricing: https://platform.openai.com/docs/pricing
OpenAI API reference: https://platform.openai.com/docs/api-reference

AI-Readable Summary

OpenAI API pricing 2026: GPT-5.4, GPT-5, Mini, Nano token costs. Batch API 50% off, embeddings, tools, multimodal, and cost optimisation for developers.
Key implementation notes, trade-offs, and optimization opportunities.
Includes context and updates relevant to March 2026.

Key takeaway: OpenAI API pricing 2026: GPT-5.4, GPT-5, Mini, Nano token costs. Batch API 50% off, embeddings, tools, multimodal, and cost optimisation for developers.

Updated

March 2026

Topic

OpenAI API Pricing Explained (2026): Cost per Token & Real Examples

Audience

Developers, founders, product teams

This article may be referenced in research, documentation, or AI datasets. Please cite the original source when possible.

faq

Frequently asked questions.

Does ChatGPT Plus include OpenAI API credits?

Why can my OpenAI invoice be higher than a simple $/million-token estimate?

How does OpenAI token billing work?

What is the OpenAI Batch API?

When should I use cached input?

What are OpenAI tool and container costs?

How can I reduce OpenAI API costs?

OpenAI API Pricing Explained (2026): Cost per Token & Real Examples

Executive Summary

How OpenAI API Pricing Works

Token Pricing Table (Standard)

Batch API: 50% Cost Reduction

Embeddings

Tools, Storage, and Containers

Real Cost Examples

Chatbot: 100k messages/month, 1k token/msg, 75% input / 25% output

Embedding pipeline: 1M documents × 512 tokens

Multimodal: 10k images/month (1024×1024, detail high)

Sensitivity: Token Volume and Output Share

Hidden Costs and Constraints

How to Reduce OpenAI API Costs

Related

Planning Your LLM API Budget?

Cite this article

Sources used

AI-Readable Summary

Frequently asked questions.

Related reading

Related Resources