Claude API Pricing Breakdown (2026): Tokens, Limits & Costs

Claude Pricing at a Glance
Anthropic's Claude API pricing is model-specific, with input and output billed separately. Output costs are typically 5x input costs, so output-heavy workloads can feel expensive — but caching and smart model selection help.
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 |
| Claude Sonnet 4.6 | $3.00 | $15.00 |
| Claude Haiku 4.5 | $1.00 | $5.00 |
Why Claude Feels Expensive
Claude's output pricing is higher than input — a 5:1 ratio for most models. That means:
- Chatbots with long replies cost more than you might expect
- Agentic workflows with verbose tool outputs and reasoning steps add up quickly
- Summarisation is more input-heavy, so it's relatively cheaper
If your workload is output-heavy, consider Haiku for simpler tasks and reserve Opus for complex reasoning. Caching also reduces input cost when you reuse context.
Caching Explained
Claude supports prompt caching with distinct pricing for:
- Cache writes: The first request that populates the cache. Billed at a premium.
- Cache hits and refreshes: Subsequent requests that use the cached content. Billed at a much lower rate — often 90% cheaper than fresh input.
Financially, caching pays off when you send the same context repeatedly — for example, a long system prompt, RAG chunks, or document context. The break-even depends on how often the cache is hit; for high-volume workloads with stable context, savings are substantial.
Rate Limits and Quota Tiers
Rate limits vary by model and tier. Anthropic documents RPM (requests per minute) and TPM (tokens per minute). Quotas reset on a schedule (typically midnight Pacific). If you hit limits, consider:
- Upgrading your tier for higher throughput
- Using Haiku for high-volume, low-complexity tasks
- Implementing exponential backoff and request queuing
Check the official rate limits documentation for your project's current limits.
Real Workload Example (Apples to Apples)
Same scenario as the OpenAI article: 100,000 chats/month, 1,000 input tokens and 250 output tokens per chat (100M input + 25M output monthly).
| Model | Calculation | Monthly Cost |
|---|---|---|
| Claude Haiku 4.5 | (100 × $1) + (25 × $5) | $225 |
| Claude Sonnet 4.6 | (100 × $3) + (25 × $15) | $675 |
| Claude Opus 4.6 | (100 × $5) + (25 × $25) | $1,125 |
Compared to OpenAI's GPT-5.2 ($525) and GPT-5 mini ($75), Claude's mid-tier (Sonnet) sits between them. Haiku is competitive for high-volume, simple workloads.
Planning Your LLM API Budget?
Choosing the right model and optimising token usage can significantly reduce costs. If you need help sizing your API spend or designing cost-efficient AI workflows, let's discuss your requirements.
Book a Free Strategy CallFrequently Asked Questions
Related reading
Gemini API pricing: Flash and Flash-Lite token costs, thinking tokens, rate limits, and free tier. Real workload examples for developers.
Read next: Gemini API Pricing Explained: Token Costs and Free Tier →Related Resources
Compare OpenAI, Claude, and Gemini token costs and real usage examples.
Read more →Learn how to identify automation opportunities, choose the right tools, and measure ROI.
Read more →See how AI automation can resolve 70% of customer support tickets automatically.
Read more →Real-world example of AI automation delivering measurable results.
Read more →