OpenAI API Pricing Explained (2026): Cost per Token & Real Examples

OpenAI Pricing at a Glance
Understanding OpenAI's pricing model is essential for anyone building with the API. Costs are driven by tokens — both input (what you send) and output (what the model generates) — with significant savings available through caching and the Batch API.
| Model | Input (per 1M tokens) | Cached Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|---|
| GPT-5.2 | $1.75 | $0.175 | $14.00 |
| GPT-5.2 Pro | $21.00 | — | $168.00 |
| GPT-5 Mini | $0.25 | $0.025 | $2.00 |
Batch API: Save 50% on inputs and outputs for jobs that complete within 24 hours. Ideal for summarisation, classification, and other non-real-time workloads.
How Token Billing Works
OpenAI charges separately for three token types:
- Input tokens: Your prompt, system message, and any context you send. Billed at the standard input rate.
- Cached input tokens: When you use prompt caching, repeated context (e.g. long documents, RAG chunks) is billed at a much lower rate once cached. Cache hits are cheaper than fresh input.
- Output tokens: The model's response. Output is typically 5–10x more expensive than input, so verbose outputs drive cost quickly.
The ratio of input to output matters. A chatbot with 1,000 input tokens and 250 output tokens per turn has a 4:1 input-to-output ratio. Agentic workflows with long tool outputs can flip that ratio and become output-heavy — and expensive.
Cost Calculator Formula
Use this formula to estimate monthly cost:
monthly_cost = (input_tokens / 1_000_000 * input_rate) + (output_tokens / 1_000_000 * output_rate)
// Example: 100M input + 25M output on GPT-5.2
// (100 * 1.75) + (25 * 14) = 175 + 350 = $525/month
For cached input, substitute the cached rate for the portion of input that hits the cache. Batch API cuts both input and output rates by 50%.
Three Real Workloads
1. Chatbot (1,000 in / 250 out per turn)
High volume, moderate context. Caching helps if you reuse system prompts or RAG chunks. GPT-5 mini is often sufficient for simple Q&A; GPT-5.2 for complex reasoning.
2. Summariser (5,000 in / 500 out per doc)
Input-heavy. Batch API is ideal — summaries don't need real-time latency. Caching document chunks across batches can further reduce cost.
3. Agentic Workflow (2,000 in / 1,500 out per step, 5 steps)
Output-heavy. Tool results and long reasoning add up. Consider smaller models for routing and tool use; reserve flagship models for final synthesis.
Cost Levers
- Caching: Cache system prompts, RAG chunks, and repeated context. Cached input is ~10x cheaper than fresh input.
- Batch API: 50% off for non-real-time jobs. Run summarisation, classification, and extraction in batch.
- Smaller model: GPT-5 mini is 7x cheaper than GPT-5.2 on input and output. Use for simple tasks.
- Shorter outputs: Set
max_tokensand use structured outputs to avoid runaway generation.
Common Mistakes
- Context explosion: Sending full conversation history without truncation. Summarise or drop old turns.
- Verbose system prompts: Long system messages are billed as input every request. Keep them lean or cache them.
- No truncation: Failing to limit output length leads to unnecessary tokens. Always set reasonable
max_tokens.
Monthly Workload Example
Assume 100,000 chats/month, with 1,000 input tokens and 250 output tokens per chat on average:
- 100M input tokens + 25M output tokens monthly
| Model | Calculation | Monthly Cost |
|---|---|---|
| GPT-5 Mini | (100 × $0.25) + (25 × $2.00) | $75 |
| GPT-5.2 | (100 × $1.75) + (25 × $14) | $525 |
| GPT-5.2 Pro | (100 × $21) + (25 × $168) | $6,300 |
Async batch version: ~50% less if your workload is eligible for the Batch API.
Planning Your LLM API Budget?
Choosing the right model and optimising token usage can significantly reduce costs. If you need help sizing your API spend or designing cost-efficient AI workflows, let's discuss your requirements.
Book a Free Strategy CallFrequently Asked Questions
Related reading
Claude API pricing 2026: Opus, Sonnet, and Haiku token costs. Caching, rate limits, and real workload examples for developers.
Read next: Claude API Pricing Breakdown (2026): Tokens, Limits & Costs →Related Resources
Compare OpenAI, Claude, and Gemini token costs and real usage examples.
Read more →Learn how to identify automation opportunities, choose the right tools, and measure ROI.
Read more →See how AI automation can resolve 70% of customer support tickets automatically.
Read more →Real-world example of AI automation delivering measurable results.
Read more →