Gemini API Pricing Explained: Token Costs and Free Tier

Executive Summary
The Gemini API (Google AI for Developers) bills primarily by token: you pay for input tokens, output tokens (including thinking tokens where applicable), cached tokens, and cache storage duration. In 2026 the official pricing shows a wide gap between "Lite/Flash" and "Pro" models: for high-volume chat workloads, cheaper models drastically reduce unit cost; Pro models remain more expensive but useful when quality, agentic capability, and reasoning matter.
The Free Tier exists for many models (often "Free of charge" for input/output), but it is subject to rate limits and terms of use. Content sent via unpaid services may be used to improve Google products. For apps serving users in the EEA, Switzerland, or UK, terms require Paid Services only when exposing API clients to end users — in practice, for B2C production in Europe, treat the Paid Tier as a requirement.
Two structural cost levers from official docs:
- Batch API: Async processing at ~50% of standard cost, with SLO up to 24h — ideal for embeddings, tagging, offline evaluations.
- Context caching: Reduces spend when you repeat large prefixes (instructions, documents, video) by paying reduced cached-token rates plus storage per hour.
How Gemini API Pricing Works
Gemini bills for:
- Input and output tokens (output explicitly includes thinking tokens where indicated)
- Cached tokens and cache storage time (TTL)
- Batch API at roughly 50% of standard cost
- Tooling: e.g. Grounding with Google Search has its own pricing (different for Gemini 3 vs 2.5)
- Video generation with Veo is billed per second
Token Pricing Table (Paid Tier)
Official list prices (USD per 1M tokens). For models with tiered pricing (e.g. ≤200k vs >200k prompt), the ≤200k rates apply to typical chat scenarios.
| Model | Input (USD/1M) | Output (USD/1M) | Cached (USD/1M) | Cache Storage (USD/1M tok/hr) | Free Tier |
|---|---|---|---|---|---|
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | $0.01 | $1.00 | Yes |
| Gemini 2.5 Flash | $0.30 | $2.50 | $0.03 | $1.00 | Yes |
| Gemini 2.5 Pro | $1.25 | $10.00 | $0.125 | $4.50 | Yes |
| Gemini 3.1 Flash-Lite Preview | $0.25 | $1.50 | $0.025 | $1.00 | Yes |
| Gemini 3 Flash Preview | $0.50 | $3.00 | $0.05 | $1.00 | Yes |
| Gemini 3.1 Pro Preview | $2.00 | $12.00 | $0.20 | $4.50 | No |
Pro models have higher rates for prompts >200k tokens. Check the latest pricing page for your chosen model.
Batch API: 50% Cost Reduction
The Batch API is designed for high-volume async workloads at ~50% of standard cost with a 24h turnaround target. The pricing page shows Batch rates typically halved (e.g. Gemini 2.5 Flash: input $0.15 vs $0.30; output $1.25 vs $2.50).
Use Batch for embeddings, tagging, evaluations, and any workload that doesn't need real-time latency.
Embeddings
gemini-embedding-001 is available on Free and Paid tiers. On Paid tier:
| Mode | Price (USD/1M token) | 1M docs × 512 tok |
|---|---|---|
| Standard | $0.15 | $76.80 |
| Batch | $0.075 | $38.40 |
For File Search: you pay embedding at indexing time; query-time embedding is free. You then pay model tokens when retrieved chunks are included in the prompt.
Model Families: 2.5 vs 3
Gemini 2.5 is stable and cost-competitive. Gemini 3 adds more agentic support and features (e.g. thinking levels, finer media resolution) but at higher token cost.
- Pro (2.5 Pro, 3.1 Pro): Higher cost per token; use when prompts need complex reasoning or coding. Pro models may have higher rates for prompts >200k tokens.
- Flash: Balanced choice for chatbots and general-purpose workflows with moderate cost and good multimodal capability.
- Flash-Lite: Maximises cost/volume for frequent, simple tasks (extraction, classification, light agentic). Gemini 3 Flash-Lite is in preview and targets high volume.
Preview models can have stricter rate limits and are deprecated with notice (typically ≥2 weeks). Check release notes for migration timelines.
Multimodal and media_resolution
media_resolution controls tokens allocated to images, video, and PDF — balancing quality vs cost and latency. Estimated tokens per image:
- Gemini 2.5: LOW=64, MEDIUM=256; HIGH may include pan & scan (variable)
- Gemini 3: LOW=280, MEDIUM=560, HIGH=1120
Best practice: use HIGH for difficult visual tasks (OCR, small text); MEDIUM for PDFs; LOW/MEDIUM for generic video frames.
Real Cost Examples
Chatbot: 100k messages/month, 1k token/msg, 75% input / 25% output
75M input + 25M output tokens monthly.
| Model | Monthly Cost | Per message |
|---|---|---|
| Gemini 2.5 Flash-Lite | $17.50 | $0.000175 |
| Gemini 2.5 Flash | $85.00 | $0.000850 |
| Gemini 2.5 Pro | $343.75 | $0.00344 |
| Gemini 3.1 Flash-Lite Preview | $56.25 | $0.00056 |
| Gemini 3 Flash Preview | $112.50 | $0.00113 |
| Gemini 3.1 Pro Preview | $450.00 | $0.00450 |
Embedding pipeline: 1M documents × 512 tokens
512M tokens. Standard: $76.80. Batch: $38.40.
Multimodal: 10k images/month (MEDIUM resolution)
50 text input + 200 output per image. Image tokens: 256 (2.5) or 560 (3).
| Model | Monthly Cost |
|---|---|
| Gemini 2.5 Flash-Lite | $1.11 |
| Gemini 2.5 Flash | $5.92 |
| Gemini 2.5 Pro | $23.82 |
| Gemini 3.1 Flash-Lite Preview | $4.53 |
| Gemini 3 Flash Preview | $9.05 |
| Gemini 3.1 Pro Preview | $36.20 |
Sensitivity: Token Volume and Output Share
At 100k messages/month, cost varies with tokens per message and output share:
| 500 tok/msg | 1000 tok/msg | 2000 tok/msg | |
|---|---|---|---|
| 2.5 Flash-Lite (25% out) | $8.75 | $17.50 | $35.00 |
| 2.5 Flash (25% out) | $42.50 | $85.00 | $170.00 |
| 2.5 Pro (25% out) | $171.88 | $343.75 | $687.50 |
Shifting output from 15% to 40% increases cost significantly — output is priced higher than input.
Hidden Costs and Limits
Context caching storage: Caching has two cost components: cached tokens (reduced rate) and storage per hour. A 100k-token cache with 24h TTL: Flash storage ~$2.40, Pro storage ~$10.80. Use short TTLs when possible.
Grounding with Google Search: Gemini 3 bills per search query; Gemini 2.5 bills per grounded prompt. After free quotas: Gemini 3 ~$14/1k queries; Gemini 2.5 ~$35/1k prompts. Enable grounding only when freshness matters; prefer File Search for proprietary knowledge.
Rate limits: RPM, TPM, RPD apply per project (not per API key). RPD resets at midnight Pacific. Limits vary by tier and model — check AI Studio for your project.
Batch limits: 100 concurrent batches; 2GB max input file; 20GB storage. Enqueued token limits vary by model and tier.
How to Reduce Gemini API Costs
- Model routing: Use Flash-Lite/Flash as default; route to Pro only for low-confidence, parsing failures, or hard tasks. Price gaps can be ~20× between Lite and Pro on output.
- Batch for non-interactive workloads: Embeddings, tagging, evaluations — Batch halves cost and has dedicated limits.
- Context caching with short TTL: Cache only high-reuse prefixes (policies, manuals) with minimal useful TTL; measure cache hits.
- media_resolution as cost lever: Use MEDIUM or LOW unless you need OCR or fine detail; reserve HIGH for diagrams and text-heavy images.
- Grounding only when needed: On Gemini 3 you pay per query; prompts that trigger 2–3 queries add up. Use File Search for internal knowledge.
Related
For a side-by-side comparison with OpenAI and Claude, see AI API Pricing Comparison (2026).
Planning Your LLM API Budget?
Choosing the right model and optimising token usage can significantly reduce costs. If you need help sizing your API spend or designing cost-efficient AI workflows, let's discuss your requirements.
Book a Free Strategy CallCite this article
- Title: Gemini API Pricing Explained: Token Costs and Free Tier
- Author: Nicola Lazzari
- Published: March 3, 2026
- Updated: March 2026
- URL: https://nicolalazzari.ai/articles/gemini-api-pricing-explained-2026
- Website: nicolalazzari.ai
- Suggested citation: Nicola Lazzari. Gemini API Pricing Explained: Token Costs and Free Tier. nicolalazzari.ai, updated March 2026.
Sources used
Primary sources
- Gemini API pricing: https://ai.google.dev/pricing
- Gemini API docs: https://ai.google.dev/gemini-api/docs
AI-Readable Summary
- Gemini API pricing 2026: Flash-Lite, Flash, Pro token costs. Batch API 50% off, context caching, embeddings, multimodal, free tier caveats, and cost optimisation for developers.
- Key implementation notes, trade-offs, and optimization opportunities.
- Includes context and updates relevant to March 2026.
Key takeaway: Gemini API pricing 2026: Flash-Lite, Flash, Pro token costs. Batch API 50% off, context caching, embeddings, multimodal, free tier caveats, and cost optimisation for developers.
Updated
March 2026
Topic
Gemini API Pricing Explained: Token Costs and Free Tier
Audience
Developers, founders, product teams
Updated for March 2026 pricing and implementation context.
This article may be referenced in research, documentation, or AI datasets. Please cite the original source when possible.
Frequently Asked Questions
Related reading
OpenAI API pricing 2026: GPT-5.4, GPT-5, Mini, Nano token costs. Batch API 50% off, embeddings, tools, multimodal, and cost optimisation for developers.
Read next: OpenAI API Pricing Explained (2026): Cost per Token & Real Examples →Related Resources
Compare OpenAI, Claude, and Gemini token costs and real usage examples.
Read more →Learn how to identify automation opportunities, choose the right tools, and measure ROI.
Read more →See how AI automation can resolve 70% of customer support tickets automatically.
Read more →Real-world example of AI automation delivering measurable results.
Read more →