Does Claude extended thinking increase API cost?

Yes. When extended thinking is enabled, internal reasoning tokens are billed as output—and output is priced at roughly five times input for Opus, Sonnet, and Haiku. Longer thinking budgets can dominate the bill even if the final user-visible answer is short.

Does Anthropic provide first-party embeddings with Claude?

No. Claude bills chat/completions and tools, but there is no first-party Claude embedding model. Most teams pair Claude with a dedicated embedding provider for RAG—add that provider’s token or request pricing to your total TCO.

How does Claude API pricing compare to OpenAI?

Claude's output pricing is 5× input for the trio. For 100k messages/month (75/25 mix), Claude Sonnet ~$600 vs OpenAI GPT-5.4 ~$562.50. Claude Haiku ~$200 remains cheaper, while OpenAI's low-cost options can be lower for lightweight tasks. Claude has no first-party embeddings; OpenAI offers cheap embedding models.

What is Claude prompt caching?

Cache writes: 5-minute TTL at 1.25× base input, 1-hour at 2×. Cache reads (hits): 0.1× base input — often 90% cheaper. Caching pays off with 1 read (5m cache) or 2 reads (1h cache). Cached reads typically don't count toward ITPM rate limits.

When should I use Claude Haiku vs Sonnet vs Opus?

Haiku: high-volume, simple tasks (classification, extraction, short Q&A). Sonnet: balanced performance and cost for most applications. Opus: complex reasoning, agentic tasks, long-form writing. Match the model to task complexity to control costs.

Does Claude have a Batch API?

Yes. Message Batches (POST /v1/messages/batches) offer 50% discount on input and output tokens for async processing with up to 24h completion. Limited to 100k requests or 256MB per batch. Results available for 29 days.

What are Claude tool costs?

Web search: $10 per 1,000 searches plus token costs. Web fetch: no extra charge beyond tokens. Code execution: free with web search/fetch; otherwise billed by execution time. Tool use adds ~313–346 system prompt tokens.

Claude API $ per 1M tokens (May 2026)

Executive Summary

The Claude Developer Platform from Anthropic prices the API primarily by input tokens and output tokens, with cost modifiers for prompt caching, batch processing, long-context requests, and data residency routing. The primary endpoint is Messages (POST /v1/messages) on api.anthropic.com.

For the latest trio (Opus/Sonnet/Haiku), official list prices (standard speed, ≤200K input) are: Opus 4.6 $5/MTok in, $25/MTok out; Sonnet 4.6 $3/MTok in, $15/MTok out; Haiku 4.5 $1/MTok in, $5/MTok out. Output tokens are 5× the input rate, so response length is often the biggest cost lever.

For the chatbot workload (100k messages/month, 1k tokens/message, 75% input / 25% output), estimated monthly spend is ~$1,000 (Opus), ~$600 (Sonnet), or ~$200 (Haiku) at standard pricing.

Two gotchas: Embeddings and fine-tuning are not first-party — Anthropic does not offer its own embedding model, and the API does not currently offer fine-tuning. Tooling adds non-token billing: web search is $10/1,000 searches on top of token costs.

How Claude API Pricing Works

Claude bills for input and output tokens. Cost modifiers include:

Prompt caching: Cache writes at 1.25× or 2× base; cache reads at 0.1× base
Batch API: 50% discount on input and output
Long context: Requests >200K input tokens use premium long-context rates
Data residency: US-only inference (inference_geo) incurs 1.1× multiplier for Opus 4.6 and newer

Token Pricing Table

Model	Input (USD/1M)	Output (USD/1M)	5m Cache Write	1h Cache Write	Cache Hit
Opus 4.6	$5.00	$25.00	$6.25	$10.00	$0.50
Sonnet 4.6	$3.00	$15.00	$3.75	$6.00	$0.30
Haiku 4.5	$1.00	$5.00	$1.25	$2.00	$0.10

Caching pays off quickly: 1 cache read for 5-minute cache, 2 reads for 1-hour cache. Multipliers stack with batch discounts and data residency.

Batch API: 50% Cost Reduction

Anthropic offers a 50% discount for the Batch API on both input and output tokens:

Model	Batch Input (USD/1M)	Batch Output (USD/1M)
Opus 4.6	$2.50	$12.50
Sonnet 4.6	$1.50	$7.50
Haiku 4.5	$0.50	$2.50

Batch is ideal for offline workloads (evaluations, bulk summarisation, classification). Batches are limited to 100,000 requests or 256MB; results available for 29 days. Fast mode is not available with Batch.

Model Families and Capabilities

Opus: Highest capability for complex agentic tasks and coding. 1M context (beta), up to 128K output.
Sonnet: Balance of speed and intelligence. Often the best default for most applications.
Haiku: Fastest and most economical for high-volume, cost-sensitive deployments.

Extended thinking (Opus 4.6, Sonnet 4.6): thinking budget tokens are billed as output. Enabling extended thinking can materially increase output cost.

Vision: Image Token Estimation

Claude supports image + text input. Image tokens are estimated as:

image_tokens ≈ (width_px × height_px) / 750

A 1000×1000 image ≈ 1,334 tokens. Resize images below thresholds to avoid unnecessary latency and cost.

Real Cost Examples

Chatbot: 100k messages/month, 1k token/msg, 75% input / 25% output

75M input + 25M output tokens monthly.

Model	Monthly Cost
Haiku 4.5	~$200
Sonnet 4.6	~$600
Opus 4.6	~$1,000

Multimodal: 10k images/month (1000×1000, 50 text in + 200 out)

~1,334 image tokens + 50 text = 1,384 input per request.

Model	Monthly Cost
Haiku 4.5	~$24
Sonnet 4.6	~$72
Opus 4.6	~$119

Embeddings

Anthropic does not offer a first-party embedding model. Use an external provider (e.g. Voyage AI) for RAG and retrieval pipelines. Embedding costs are not part of the Claude bill.

Sensitivity: Token Volume and Output Share

At 100k messages/month, cost varies with tokens per message and output share:

	500 tok/msg	1000 tok/msg	2000 tok/msg
Haiku (25% out)	$100	$200	$400
Sonnet (25% out)	$300	$600	$1,200
Opus (25% out)	$500	$1,000	$2,000

Shifting output from 15% to 40% increases cost significantly.

Hidden Costs and Tooling

Web search: $10 per 1,000 searches, plus standard token costs for search-generated content.

Tool overhead: Enabling tools adds system prompt tokens (~313–346 depending on config). Tool schemas and tool result blocks count as input/output.

Code execution: Free when used with web search or web fetch; otherwise billed by execution time with a minimum runtime and monthly free allowance.

Long-context premium: If you enable 1M context and total input exceeds 200K tokens, the entire request switches to premium long-context pricing. The 200K threshold includes cache reads/writes.

Data residency: US-only inference adds 1.1× multiplier for Opus 4.6 and newer models.

Rate Limits

Anthropic measures limits in RPM (requests/minute), ITPM (input tokens/minute), and OTPM (output tokens/minute). Cached read tokens typically do not count toward ITPM. Exact values vary by tier and model — check the console for your project.

Tier progression: Tier 1 $5 cumulative credit; Tier 2 $40; Tier 3 $200; Tier 4 $400.

How to Reduce Claude API Costs

Model routing: Default to Haiku or Sonnet; escalate to Opus only for complex reasoning, high-stakes coding, or hard tool orchestration.
Output control: Output is 5× input price. Shorter answers, structured outputs, and retrieval-first responses yield outsized savings.
Prompt caching: Cache large stable instructions, tool catalogs, and long documents. Cached reads are 0.1× base and often don't count toward ITPM.
Batch API: Use for non-interactive workloads — 50% cost reduction, most batches complete in under an hour.
Image resizing: Keep images below resizing thresholds; use the token formula to budget vision costs early.

For a side-by-side comparison with OpenAI and Gemini, see AI API Pricing Comparison (2026).

Planning Your LLM API Budget?

Choosing the right model and optimising token usage can significantly reduce costs. If you need help sizing your API spend or designing cost-efficient AI workflows, let's discuss your requirements.

Book a Free Strategy Call

This article may be referenced in research, documentation, or AI datasets. Please cite the original source when possible.

Claude API Pricing Breakdown (2026): Tokens, Limits & Costs

Executive Summary

How Claude API Pricing Works

Token Pricing Table

Batch API: 50% Cost Reduction

Model Families and Capabilities

Vision: Image Token Estimation

Real Cost Examples

Chatbot: 100k messages/month, 1k token/msg, 75% input / 25% output

Multimodal: 10k images/month (1000×1000, 50 text in + 200 out)

Embeddings

Sensitivity: Token Volume and Output Share

Hidden Costs and Tooling

Rate Limits

How to Reduce Claude API Costs

Related

Planning Your LLM API Budget?

Cite this article

Sources used

Primary sources

AI-Readable Summary

Frequently Asked Questions

Related reading

Related Resources