Claude API Pricing Breakdown (2026): Tokens, Limits & Costs

Executive Summary
The Claude Developer Platform from Anthropic prices the API primarily by input tokens and output tokens, with cost modifiers for prompt caching, batch processing, long-context requests, and data residency routing. The primary endpoint is Messages (POST /v1/messages) on api.anthropic.com.
For the latest trio (Opus/Sonnet/Haiku), official list prices (standard speed, ≤200K input) are: Opus 4.6 $5/MTok in, $25/MTok out; Sonnet 4.6 $3/MTok in, $15/MTok out; Haiku 4.5 $1/MTok in, $5/MTok out. Output tokens are 5× the input rate, so response length is often the biggest cost lever.
For the chatbot workload (100k messages/month, 1k tokens/message, 75% input / 25% output), estimated monthly spend is ~$1,000 (Opus), ~$600 (Sonnet), or ~$200 (Haiku) at standard pricing.
Two gotchas: Embeddings and fine-tuning are not first-party — Anthropic does not offer its own embedding model, and the API does not currently offer fine-tuning. Tooling adds non-token billing: web search is $10/1,000 searches on top of token costs.
How Claude API Pricing Works
Claude bills for input and output tokens. Cost modifiers include:
- Prompt caching: Cache writes at 1.25× or 2× base; cache reads at 0.1× base
- Batch API: 50% discount on input and output
- Long context: Requests >200K input tokens use premium long-context rates
- Data residency: US-only inference (
inference_geo) incurs 1.1× multiplier for Opus 4.6 and newer
Token Pricing Table
| Model | Input (USD/1M) | Output (USD/1M) | 5m Cache Write | 1h Cache Write | Cache Hit |
|---|---|---|---|---|---|
| Opus 4.6 | $5.00 | $25.00 | $6.25 | $10.00 | $0.50 |
| Sonnet 4.6 | $3.00 | $15.00 | $3.75 | $6.00 | $0.30 |
| Haiku 4.5 | $1.00 | $5.00 | $1.25 | $2.00 | $0.10 |
Caching pays off quickly: 1 cache read for 5-minute cache, 2 reads for 1-hour cache. Multipliers stack with batch discounts and data residency.
Batch API: 50% Cost Reduction
Anthropic offers a 50% discount for the Batch API on both input and output tokens:
| Model | Batch Input (USD/1M) | Batch Output (USD/1M) |
|---|---|---|
| Opus 4.6 | $2.50 | $12.50 |
| Sonnet 4.6 | $1.50 | $7.50 |
| Haiku 4.5 | $0.50 | $2.50 |
Batch is ideal for offline workloads (evaluations, bulk summarisation, classification). Batches are limited to 100,000 requests or 256MB; results available for 29 days. Fast mode is not available with Batch.
Model Families and Capabilities
- Opus: Highest capability for complex agentic tasks and coding. 1M context (beta), up to 128K output.
- Sonnet: Balance of speed and intelligence. Often the best default for most applications.
- Haiku: Fastest and most economical for high-volume, cost-sensitive deployments.
Extended thinking (Opus 4.6, Sonnet 4.6): thinking budget tokens are billed as output. Enabling extended thinking can materially increase output cost.
Vision: Image Token Estimation
Claude supports image + text input. Image tokens are estimated as:
image_tokens ≈ (width_px × height_px) / 750
A 1000×1000 image ≈ 1,334 tokens. Resize images below thresholds to avoid unnecessary latency and cost.
Real Cost Examples
Chatbot: 100k messages/month, 1k token/msg, 75% input / 25% output
75M input + 25M output tokens monthly.
| Model | Monthly Cost |
|---|---|
| Haiku 4.5 | ~$200 |
| Sonnet 4.6 | ~$600 |
| Opus 4.6 | ~$1,000 |
Multimodal: 10k images/month (1000×1000, 50 text in + 200 out)
~1,334 image tokens + 50 text = 1,384 input per request.
| Model | Monthly Cost |
|---|---|
| Haiku 4.5 | ~$24 |
| Sonnet 4.6 | ~$72 |
| Opus 4.6 | ~$119 |
Embeddings
Anthropic does not offer a first-party embedding model. Use an external provider (e.g. Voyage AI) for RAG and retrieval pipelines. Embedding costs are not part of the Claude bill.
Sensitivity: Token Volume and Output Share
At 100k messages/month, cost varies with tokens per message and output share:
| 500 tok/msg | 1000 tok/msg | 2000 tok/msg | |
|---|---|---|---|
| Haiku (25% out) | $100 | $200 | $400 |
| Sonnet (25% out) | $300 | $600 | $1,200 |
| Opus (25% out) | $500 | $1,000 | $2,000 |
Shifting output from 15% to 40% increases cost significantly.
Hidden Costs and Tooling
Web search: $10 per 1,000 searches, plus standard token costs for search-generated content.
Tool overhead: Enabling tools adds system prompt tokens (~313–346 depending on config). Tool schemas and tool result blocks count as input/output.
Code execution: Free when used with web search or web fetch; otherwise billed by execution time with a minimum runtime and monthly free allowance.
Long-context premium: If you enable 1M context and total input exceeds 200K tokens, the entire request switches to premium long-context pricing. The 200K threshold includes cache reads/writes.
Data residency: US-only inference adds 1.1× multiplier for Opus 4.6 and newer models.
Rate Limits
Anthropic measures limits in RPM (requests/minute), ITPM (input tokens/minute), and OTPM (output tokens/minute). Cached read tokens typically do not count toward ITPM. Exact values vary by tier and model — check the console for your project.
Tier progression: Tier 1 $5 cumulative credit; Tier 2 $40; Tier 3 $200; Tier 4 $400.
How to Reduce Claude API Costs
- Model routing: Default to Haiku or Sonnet; escalate to Opus only for complex reasoning, high-stakes coding, or hard tool orchestration.
- Output control: Output is 5× input price. Shorter answers, structured outputs, and retrieval-first responses yield outsized savings.
- Prompt caching: Cache large stable instructions, tool catalogs, and long documents. Cached reads are 0.1× base and often don't count toward ITPM.
- Batch API: Use for non-interactive workloads — 50% cost reduction, most batches complete in under an hour.
- Image resizing: Keep images below resizing thresholds; use the token formula to budget vision costs early.
Related
For a side-by-side comparison with OpenAI and Gemini, see AI API Pricing Comparison (2026).
Planning Your LLM API Budget?
Choosing the right model and optimising token usage can significantly reduce costs. If you need help sizing your API spend or designing cost-efficient AI workflows, let's discuss your requirements.
Book a Free Strategy CallCite this article
- Title: Claude API Pricing Breakdown (2026): Tokens, Limits & Costs
- Author: Nicola Lazzari
- Published: March 2, 2026
- Updated: March 2026
- URL: https://nicolalazzari.ai/articles/claude-api-pricing-breakdown-2026
- Website: nicolalazzari.ai
- Suggested citation: Nicola Lazzari. Claude API Pricing Breakdown (2026): Tokens, Limits & Costs. nicolalazzari.ai, updated March 2026.
Sources used
Primary sources
- Anthropic pricing: https://www.anthropic.com/pricing
- Anthropic docs: https://docs.anthropic.com/
AI-Readable Summary
- Claude API pricing 2026: Opus, Sonnet, Haiku token costs. Prompt caching, Batch API 50% off, tooling, multimodal, no embeddings, and cost optimisation for developers.
- Key implementation notes, trade-offs, and optimization opportunities.
- Includes context and updates relevant to March 2026.
Key takeaway: Claude API pricing 2026: Opus, Sonnet, Haiku token costs. Prompt caching, Batch API 50% off, tooling, multimodal, no embeddings, and cost optimisation for developers.
Updated
March 2026
Topic
Claude API Pricing Breakdown (2026): Tokens, Limits & Costs
Audience
Developers, founders, product teams
Updated for March 2026 pricing and implementation context.
This article may be referenced in research, documentation, or AI datasets. Please cite the original source when possible.
Frequently Asked Questions
Related reading
Gemini API pricing 2026: Flash-Lite, Flash, Pro token costs. Batch API 50% off, context caching, embeddings, multimodal, free tier caveats, and cost optimisation for developers.
Read next: Gemini API Pricing Explained: Token Costs and Free Tier →Related Resources
Compare OpenAI, Claude, and Gemini token costs and real usage examples.
Read more →Learn how to identify automation opportunities, choose the right tools, and measure ROI.
Read more →See how AI automation can resolve 70% of customer support tickets automatically.
Read more →Real-world example of AI automation delivering measurable results.
Read more →