Claude API Pricing Breakdown (2026): Tokens, Limits & Costs

Nicola Lazzari
Claude API pricing comparison showing Opus, Sonnet, and Haiku token costs

Executive Summary

The Claude Developer Platform from Anthropic prices the API primarily by input tokens and output tokens, with cost modifiers for prompt caching, batch processing, long-context requests, and data residency routing. The primary endpoint is Messages (POST /v1/messages) on api.anthropic.com.

For the latest trio (Opus/Sonnet/Haiku), official list prices (standard speed, ≤200K input) are: Opus 4.6 $5/MTok in, $25/MTok out; Sonnet 4.6 $3/MTok in, $15/MTok out; Haiku 4.5 $1/MTok in, $5/MTok out. Output tokens are 5× the input rate, so response length is often the biggest cost lever.

For the chatbot workload (100k messages/month, 1k tokens/message, 75% input / 25% output), estimated monthly spend is ~$1,000 (Opus), ~$600 (Sonnet), or ~$200 (Haiku) at standard pricing.

Two gotchas: Embeddings and fine-tuning are not first-party — Anthropic does not offer its own embedding model, and the API does not currently offer fine-tuning. Tooling adds non-token billing: web search is $10/1,000 searches on top of token costs.


How Claude API Pricing Works

Claude bills for input and output tokens. Cost modifiers include:

  • Prompt caching: Cache writes at 1.25× or 2× base; cache reads at 0.1× base
  • Batch API: 50% discount on input and output
  • Long context: Requests >200K input tokens use premium long-context rates
  • Data residency: US-only inference (inference_geo) incurs 1.1× multiplier for Opus 4.6 and newer

Token Pricing Table

Model Input (USD/1M) Output (USD/1M) 5m Cache Write 1h Cache Write Cache Hit
Opus 4.6 $5.00 $25.00 $6.25 $10.00 $0.50
Sonnet 4.6 $3.00 $15.00 $3.75 $6.00 $0.30
Haiku 4.5 $1.00 $5.00 $1.25 $2.00 $0.10

Caching pays off quickly: 1 cache read for 5-minute cache, 2 reads for 1-hour cache. Multipliers stack with batch discounts and data residency.


Batch API: 50% Cost Reduction

Anthropic offers a 50% discount for the Batch API on both input and output tokens:

Model Batch Input (USD/1M) Batch Output (USD/1M)
Opus 4.6 $2.50 $12.50
Sonnet 4.6 $1.50 $7.50
Haiku 4.5 $0.50 $2.50

Batch is ideal for offline workloads (evaluations, bulk summarisation, classification). Batches are limited to 100,000 requests or 256MB; results available for 29 days. Fast mode is not available with Batch.


Model Families and Capabilities

  • Opus: Highest capability for complex agentic tasks and coding. 1M context (beta), up to 128K output.
  • Sonnet: Balance of speed and intelligence. Often the best default for most applications.
  • Haiku: Fastest and most economical for high-volume, cost-sensitive deployments.

Extended thinking (Opus 4.6, Sonnet 4.6): thinking budget tokens are billed as output. Enabling extended thinking can materially increase output cost.


Vision: Image Token Estimation

Claude supports image + text input. Image tokens are estimated as:

image_tokens ≈ (width_px × height_px) / 750

A 1000×1000 image ≈ 1,334 tokens. Resize images below thresholds to avoid unnecessary latency and cost.


Real Cost Examples

Chatbot: 100k messages/month, 1k token/msg, 75% input / 25% output

75M input + 25M output tokens monthly.

Model Monthly Cost
Haiku 4.5 ~$200
Sonnet 4.6 ~$600
Opus 4.6 ~$1,000

Multimodal: 10k images/month (1000×1000, 50 text in + 200 out)

~1,334 image tokens + 50 text = 1,384 input per request.

Model Monthly Cost
Haiku 4.5 ~$24
Sonnet 4.6 ~$72
Opus 4.6 ~$119

Embeddings

Anthropic does not offer a first-party embedding model. Use an external provider (e.g. Voyage AI) for RAG and retrieval pipelines. Embedding costs are not part of the Claude bill.


Sensitivity: Token Volume and Output Share

At 100k messages/month, cost varies with tokens per message and output share:

500 tok/msg 1000 tok/msg 2000 tok/msg
Haiku (25% out) $100 $200 $400
Sonnet (25% out) $300 $600 $1,200
Opus (25% out) $500 $1,000 $2,000

Shifting output from 15% to 40% increases cost significantly.


Hidden Costs and Tooling

Web search: $10 per 1,000 searches, plus standard token costs for search-generated content.

Tool overhead: Enabling tools adds system prompt tokens (~313–346 depending on config). Tool schemas and tool result blocks count as input/output.

Code execution: Free when used with web search or web fetch; otherwise billed by execution time with a minimum runtime and monthly free allowance.

Long-context premium: If you enable 1M context and total input exceeds 200K tokens, the entire request switches to premium long-context pricing. The 200K threshold includes cache reads/writes.

Data residency: US-only inference adds 1.1× multiplier for Opus 4.6 and newer models.


Rate Limits

Anthropic measures limits in RPM (requests/minute), ITPM (input tokens/minute), and OTPM (output tokens/minute). Cached read tokens typically do not count toward ITPM. Exact values vary by tier and model — check the console for your project.

Tier progression: Tier 1 $5 cumulative credit; Tier 2 $40; Tier 3 $200; Tier 4 $400.


How to Reduce Claude API Costs

  • Model routing: Default to Haiku or Sonnet; escalate to Opus only for complex reasoning, high-stakes coding, or hard tool orchestration.
  • Output control: Output is 5× input price. Shorter answers, structured outputs, and retrieval-first responses yield outsized savings.
  • Prompt caching: Cache large stable instructions, tool catalogs, and long documents. Cached reads are 0.1× base and often don't count toward ITPM.
  • Batch API: Use for non-interactive workloads — 50% cost reduction, most batches complete in under an hour.
  • Image resizing: Keep images below resizing thresholds; use the token formula to budget vision costs early.

Related

For a side-by-side comparison with OpenAI and Gemini, see AI API Pricing Comparison (2026).

Planning Your LLM API Budget?

Choosing the right model and optimising token usage can significantly reduce costs. If you need help sizing your API spend or designing cost-efficient AI workflows, let's discuss your requirements.

Book a Free Strategy Call

Cite this article

  • Title: Claude API Pricing Breakdown (2026): Tokens, Limits & Costs
  • Author: Nicola Lazzari
  • Published: March 2, 2026
  • Updated: March 2026
  • URL: https://nicolalazzari.ai/articles/claude-api-pricing-breakdown-2026
  • Website: nicolalazzari.ai
  • Suggested citation: Nicola Lazzari. Claude API Pricing Breakdown (2026): Tokens, Limits & Costs. nicolalazzari.ai, updated March 2026.

Sources used

Primary sources

AI-Readable Summary

  • Claude API pricing 2026: Opus, Sonnet, Haiku token costs. Prompt caching, Batch API 50% off, tooling, multimodal, no embeddings, and cost optimisation for developers.
  • Key implementation notes, trade-offs, and optimization opportunities.
  • Includes context and updates relevant to March 2026.

Key takeaway: Claude API pricing 2026: Opus, Sonnet, Haiku token costs. Prompt caching, Batch API 50% off, tooling, multimodal, no embeddings, and cost optimisation for developers.

Updated

March 2026

Topic

Claude API Pricing Breakdown (2026): Tokens, Limits & Costs

Audience

Developers, founders, product teams

Updated for March 2026 pricing and implementation context.

This article may be referenced in research, documentation, or AI datasets. Please cite the original source when possible.

Frequently Asked Questions

Claude's output pricing is 5× input for the trio. For 100k messages/month (75/25 mix), Claude Sonnet ~$600 vs OpenAI GPT-5.4 ~$562.50. Claude Haiku ~$200 remains cheaper, while OpenAI's low-cost options can be lower for lightweight tasks. Claude has no first-party embeddings; OpenAI offers cheap embedding models.
Cache writes: 5-minute TTL at 1.25× base input, 1-hour at 2×. Cache reads (hits): 0.1× base input — often 90% cheaper. Caching pays off with 1 read (5m cache) or 2 reads (1h cache). Cached reads typically don't count toward ITPM rate limits.
Haiku: high-volume, simple tasks (classification, extraction, short Q&A). Sonnet: balanced performance and cost for most applications. Opus: complex reasoning, agentic tasks, long-form writing. Match the model to task complexity to control costs.
Yes. Message Batches (POST /v1/messages/batches) offer 50% discount on input and output tokens for async processing with up to 24h completion. Limited to 100k requests or 256MB per batch. Results available for 29 days.
Web search: $10 per 1,000 searches plus token costs. Web fetch: no extra charge beyond tokens. Code execution: free with web search/fetch; otherwise billed by execution time. Tool use adds ~313–346 system prompt tokens.

Related reading

Gemini API pricing 2026: Flash-Lite, Flash, Pro token costs. Batch API 50% off, context caching, embeddings, multimodal, free tier caveats, and cost optimisation for developers.

Read next: Gemini API Pricing Explained: Token Costs and Free Tier

Related Resources

Article
AI API Pricing Comparison (2026)

Compare OpenAI, Claude, and Gemini token costs and real usage examples.

Read more →
Guide
The Complete Guide to AI Workflow Automation

Learn how to identify automation opportunities, choose the right tools, and measure ROI.

Read more →
Consulting
AI Customer Support Automation

See how AI automation can resolve 70% of customer support tickets automatically.

Read more →
Case Study
AI Workflow Automation Case Study

Real-world example of AI automation delivering measurable results.

Read more →