OpenAI API Pricing Explained (2026): Cost per Token & Real Examples

nicolalazzari
OpenAI API pricing comparison showing GPT-5.2, mini, and pro token costs

OpenAI Pricing at a Glance

Understanding OpenAI's pricing model is essential for anyone building with the API. Costs are driven by tokens — both input (what you send) and output (what the model generates) — with significant savings available through caching and the Batch API.

Model Input (per 1M tokens) Cached Input (per 1M tokens) Output (per 1M tokens)
GPT-5.2 $1.75 $0.175 $14.00
GPT-5.2 Pro $21.00 $168.00
GPT-5 Mini $0.25 $0.025 $2.00

Batch API: Save 50% on inputs and outputs for jobs that complete within 24 hours. Ideal for summarisation, classification, and other non-real-time workloads.


How Token Billing Works

OpenAI charges separately for three token types:

  • Input tokens: Your prompt, system message, and any context you send. Billed at the standard input rate.
  • Cached input tokens: When you use prompt caching, repeated context (e.g. long documents, RAG chunks) is billed at a much lower rate once cached. Cache hits are cheaper than fresh input.
  • Output tokens: The model's response. Output is typically 5–10x more expensive than input, so verbose outputs drive cost quickly.

The ratio of input to output matters. A chatbot with 1,000 input tokens and 250 output tokens per turn has a 4:1 input-to-output ratio. Agentic workflows with long tool outputs can flip that ratio and become output-heavy — and expensive.


Cost Calculator Formula

Use this formula to estimate monthly cost:

monthly_cost = (input_tokens / 1_000_000 * input_rate) + (output_tokens / 1_000_000 * output_rate)

// Example: 100M input + 25M output on GPT-5.2
// (100 * 1.75) + (25 * 14) = 175 + 350 = $525/month

For cached input, substitute the cached rate for the portion of input that hits the cache. Batch API cuts both input and output rates by 50%.


Three Real Workloads

1. Chatbot (1,000 in / 250 out per turn)

High volume, moderate context. Caching helps if you reuse system prompts or RAG chunks. GPT-5 mini is often sufficient for simple Q&A; GPT-5.2 for complex reasoning.

2. Summariser (5,000 in / 500 out per doc)

Input-heavy. Batch API is ideal — summaries don't need real-time latency. Caching document chunks across batches can further reduce cost.

3. Agentic Workflow (2,000 in / 1,500 out per step, 5 steps)

Output-heavy. Tool results and long reasoning add up. Consider smaller models for routing and tool use; reserve flagship models for final synthesis.


Cost Levers

  • Caching: Cache system prompts, RAG chunks, and repeated context. Cached input is ~10x cheaper than fresh input.
  • Batch API: 50% off for non-real-time jobs. Run summarisation, classification, and extraction in batch.
  • Smaller model: GPT-5 mini is 7x cheaper than GPT-5.2 on input and output. Use for simple tasks.
  • Shorter outputs: Set max_tokens and use structured outputs to avoid runaway generation.

Common Mistakes

  • Context explosion: Sending full conversation history without truncation. Summarise or drop old turns.
  • Verbose system prompts: Long system messages are billed as input every request. Keep them lean or cache them.
  • No truncation: Failing to limit output length leads to unnecessary tokens. Always set reasonable max_tokens.

Monthly Workload Example

Assume 100,000 chats/month, with 1,000 input tokens and 250 output tokens per chat on average:

  • 100M input tokens + 25M output tokens monthly
Model Calculation Monthly Cost
GPT-5 Mini (100 × $0.25) + (25 × $2.00) $75
GPT-5.2 (100 × $1.75) + (25 × $14) $525
GPT-5.2 Pro (100 × $21) + (25 × $168) $6,300

Async batch version: ~50% less if your workload is eligible for the Batch API.

Planning Your LLM API Budget?

Choosing the right model and optimising token usage can significantly reduce costs. If you need help sizing your API spend or designing cost-efficient AI workflows, let's discuss your requirements.

Book a Free Strategy Call

Frequently Asked Questions

OpenAI charges per million tokens for input (prompt + context) and output (model response). Input and cached input have different rates — cached input is roughly 10x cheaper. Output tokens are typically 5–10x more expensive than input, so output-heavy workloads cost more.
The Batch API lets you submit non-real-time jobs (summarisation, classification, extraction) and receive results within 24 hours. You save 50% on both input and output tokens compared to real-time API calls. Ideal for workloads that don't require immediate responses.
Use prompt caching when you repeatedly send the same context — for example, long system prompts, RAG document chunks, or shared knowledge bases. The first request pays the full input rate; subsequent requests that hit the cache pay the much lower cached rate.
Four main levers: (1) Use caching for repeated context, (2) Use the Batch API for non-real-time jobs (50% off), (3) Choose smaller models (GPT-5 mini) for simple tasks, (4) Limit output length with max_tokens and avoid context explosion by truncating conversation history.

Related reading

Claude API pricing 2026: Opus, Sonnet, and Haiku token costs. Caching, rate limits, and real workload examples for developers.

Read next: Claude API Pricing Breakdown (2026): Tokens, Limits & Costs

Related Resources

Article
AI API Pricing Comparison (2026)

Compare OpenAI, Claude, and Gemini token costs and real usage examples.

Read more →
Guide
The Complete Guide to AI Workflow Automation

Learn how to identify automation opportunities, choose the right tools, and measure ROI.

Read more →
Consulting
AI Customer Support Automation

See how AI automation can resolve 70% of customer support tickets automatically.

Read more →
Case Study
AI Workflow Automation Case Study

Real-world example of AI automation delivering measurable results.

Read more →