Gemini API Pricing Explained: Token Costs and Free Tier

Nicola Lazzari
Gemini API pricing comparison showing Flash and Flash-Lite token costs

Executive Summary

The Gemini API (Google AI for Developers) bills primarily by token: you pay for input tokens, output tokens (including thinking tokens where applicable), cached tokens, and cache storage duration. In 2026 the official pricing shows a wide gap between "Lite/Flash" and "Pro" models: for high-volume chat workloads, cheaper models drastically reduce unit cost; Pro models remain more expensive but useful when quality, agentic capability, and reasoning matter.

The Free Tier exists for many models (often "Free of charge" for input/output), but it is subject to rate limits and terms of use. Content sent via unpaid services may be used to improve Google products. For apps serving users in the EEA, Switzerland, or UK, terms require Paid Services only when exposing API clients to end users — in practice, for B2C production in Europe, treat the Paid Tier as a requirement.

Two structural cost levers from official docs:

  • Batch API: Async processing at ~50% of standard cost, with SLO up to 24h — ideal for embeddings, tagging, offline evaluations.
  • Context caching: Reduces spend when you repeat large prefixes (instructions, documents, video) by paying reduced cached-token rates plus storage per hour.

How Gemini API Pricing Works

Gemini bills for:

  • Input and output tokens (output explicitly includes thinking tokens where indicated)
  • Cached tokens and cache storage time (TTL)
  • Batch API at roughly 50% of standard cost
  • Tooling: e.g. Grounding with Google Search has its own pricing (different for Gemini 3 vs 2.5)
  • Video generation with Veo is billed per second

Token Pricing Table (Paid Tier)

Official list prices (USD per 1M tokens). For models with tiered pricing (e.g. ≤200k vs >200k prompt), the ≤200k rates apply to typical chat scenarios.

Model Input (USD/1M) Output (USD/1M) Cached (USD/1M) Cache Storage (USD/1M tok/hr) Free Tier
Gemini 2.5 Flash-Lite $0.10 $0.40 $0.01 $1.00 Yes
Gemini 2.5 Flash $0.30 $2.50 $0.03 $1.00 Yes
Gemini 2.5 Pro $1.25 $10.00 $0.125 $4.50 Yes
Gemini 3.1 Flash-Lite Preview $0.25 $1.50 $0.025 $1.00 Yes
Gemini 3 Flash Preview $0.50 $3.00 $0.05 $1.00 Yes
Gemini 3.1 Pro Preview $2.00 $12.00 $0.20 $4.50 No

Pro models have higher rates for prompts >200k tokens. Check the latest pricing page for your chosen model.


Batch API: 50% Cost Reduction

The Batch API is designed for high-volume async workloads at ~50% of standard cost with a 24h turnaround target. The pricing page shows Batch rates typically halved (e.g. Gemini 2.5 Flash: input $0.15 vs $0.30; output $1.25 vs $2.50).

Use Batch for embeddings, tagging, evaluations, and any workload that doesn't need real-time latency.


Embeddings

gemini-embedding-001 is available on Free and Paid tiers. On Paid tier:

Mode Price (USD/1M token) 1M docs × 512 tok
Standard $0.15 $76.80
Batch $0.075 $38.40

For File Search: you pay embedding at indexing time; query-time embedding is free. You then pay model tokens when retrieved chunks are included in the prompt.


Model Families: 2.5 vs 3

Gemini 2.5 is stable and cost-competitive. Gemini 3 adds more agentic support and features (e.g. thinking levels, finer media resolution) but at higher token cost.

  • Pro (2.5 Pro, 3.1 Pro): Higher cost per token; use when prompts need complex reasoning or coding. Pro models may have higher rates for prompts >200k tokens.
  • Flash: Balanced choice for chatbots and general-purpose workflows with moderate cost and good multimodal capability.
  • Flash-Lite: Maximises cost/volume for frequent, simple tasks (extraction, classification, light agentic). Gemini 3 Flash-Lite is in preview and targets high volume.

Preview models can have stricter rate limits and are deprecated with notice (typically ≥2 weeks). Check release notes for migration timelines.


Multimodal and media_resolution

media_resolution controls tokens allocated to images, video, and PDF — balancing quality vs cost and latency. Estimated tokens per image:

  • Gemini 2.5: LOW=64, MEDIUM=256; HIGH may include pan & scan (variable)
  • Gemini 3: LOW=280, MEDIUM=560, HIGH=1120

Best practice: use HIGH for difficult visual tasks (OCR, small text); MEDIUM for PDFs; LOW/MEDIUM for generic video frames.


Real Cost Examples

Chatbot: 100k messages/month, 1k token/msg, 75% input / 25% output

75M input + 25M output tokens monthly.

Model Monthly Cost Per message
Gemini 2.5 Flash-Lite $17.50 $0.000175
Gemini 2.5 Flash $85.00 $0.000850
Gemini 2.5 Pro $343.75 $0.00344
Gemini 3.1 Flash-Lite Preview $56.25 $0.00056
Gemini 3 Flash Preview $112.50 $0.00113
Gemini 3.1 Pro Preview $450.00 $0.00450

Embedding pipeline: 1M documents × 512 tokens

512M tokens. Standard: $76.80. Batch: $38.40.

Multimodal: 10k images/month (MEDIUM resolution)

50 text input + 200 output per image. Image tokens: 256 (2.5) or 560 (3).

Model Monthly Cost
Gemini 2.5 Flash-Lite $1.11
Gemini 2.5 Flash $5.92
Gemini 2.5 Pro $23.82
Gemini 3.1 Flash-Lite Preview $4.53
Gemini 3 Flash Preview $9.05
Gemini 3.1 Pro Preview $36.20

Sensitivity: Token Volume and Output Share

At 100k messages/month, cost varies with tokens per message and output share:

500 tok/msg 1000 tok/msg 2000 tok/msg
2.5 Flash-Lite (25% out) $8.75 $17.50 $35.00
2.5 Flash (25% out) $42.50 $85.00 $170.00
2.5 Pro (25% out) $171.88 $343.75 $687.50

Shifting output from 15% to 40% increases cost significantly — output is priced higher than input.


Hidden Costs and Limits

Context caching storage: Caching has two cost components: cached tokens (reduced rate) and storage per hour. A 100k-token cache with 24h TTL: Flash storage ~$2.40, Pro storage ~$10.80. Use short TTLs when possible.

Grounding with Google Search: Gemini 3 bills per search query; Gemini 2.5 bills per grounded prompt. After free quotas: Gemini 3 ~$14/1k queries; Gemini 2.5 ~$35/1k prompts. Enable grounding only when freshness matters; prefer File Search for proprietary knowledge.

Rate limits: RPM, TPM, RPD apply per project (not per API key). RPD resets at midnight Pacific. Limits vary by tier and model — check AI Studio for your project.

Batch limits: 100 concurrent batches; 2GB max input file; 20GB storage. Enqueued token limits vary by model and tier.


How to Reduce Gemini API Costs

  • Model routing: Use Flash-Lite/Flash as default; route to Pro only for low-confidence, parsing failures, or hard tasks. Price gaps can be ~20× between Lite and Pro on output.
  • Batch for non-interactive workloads: Embeddings, tagging, evaluations — Batch halves cost and has dedicated limits.
  • Context caching with short TTL: Cache only high-reuse prefixes (policies, manuals) with minimal useful TTL; measure cache hits.
  • media_resolution as cost lever: Use MEDIUM or LOW unless you need OCR or fine detail; reserve HIGH for diagrams and text-heavy images.
  • Grounding only when needed: On Gemini 3 you pay per query; prompts that trigger 2–3 queries add up. Use File Search for internal knowledge.

Related

For a side-by-side comparison with OpenAI and Claude, see AI API Pricing Comparison (2026).

Planning Your LLM API Budget?

Choosing the right model and optimising token usage can significantly reduce costs. If you need help sizing your API spend or designing cost-efficient AI workflows, let's discuss your requirements.

Book a Free Strategy Call

Cite this article

  • Title: Gemini API Pricing Explained: Token Costs and Free Tier
  • Author: Nicola Lazzari
  • Published: March 3, 2026
  • Updated: March 2026
  • URL: https://nicolalazzari.ai/articles/gemini-api-pricing-explained-2026
  • Website: nicolalazzari.ai
  • Suggested citation: Nicola Lazzari. Gemini API Pricing Explained: Token Costs and Free Tier. nicolalazzari.ai, updated March 2026.

Sources used

Primary sources

AI-Readable Summary

  • Gemini API pricing 2026: Flash-Lite, Flash, Pro token costs. Batch API 50% off, context caching, embeddings, multimodal, free tier caveats, and cost optimisation for developers.
  • Key implementation notes, trade-offs, and optimization opportunities.
  • Includes context and updates relevant to March 2026.

Key takeaway: Gemini API pricing 2026: Flash-Lite, Flash, Pro token costs. Batch API 50% off, context caching, embeddings, multimodal, free tier caveats, and cost optimisation for developers.

Updated

March 2026

Topic

Gemini API Pricing Explained: Token Costs and Free Tier

Audience

Developers, founders, product teams

Updated for March 2026 pricing and implementation context.

This article may be referenced in research, documentation, or AI datasets. Please cite the original source when possible.

Frequently Asked Questions

Yes. Many Gemini models offer 'Free of charge' for input/output on the Free Tier, but with rate limits and terms that may allow Google to use content to improve products. For B2C production in the EEA, Switzerland, or UK, terms require Paid Services when exposing API clients to end users.
For models with extended thinking (chain-of-thought), output pricing includes the model's internal reasoning tokens — not just the final response. This raises output costs. Verify whether your model variant bills thinking as output before scaling.
Flash-Lite is ~3x cheaper on input ($0.10 vs $0.30 per 1M) and ~6x on output ($0.40 vs $2.50 per 1M). It's optimised for simple, high-volume tasks. Flash offers better capability for complex reasoning. At 100k messages/month (75/25 mix), Flash-Lite ~$17.50 vs Flash ~$85.
The Batch API charges ~50% of standard token rates for async jobs with up to 24h turnaround. Ideal for embeddings, tagging, and offline evaluations. Batch embedding with gemini-embedding-001 is $0.075/1M tokens vs $0.15 Standard.
RPD (requests per day) resets at midnight Pacific. Limits are per project, not per API key. RPM and TPM vary by model and tier. Check AI Studio for your project's current limits.

Related reading

OpenAI API pricing 2026: GPT-5.4, GPT-5, Mini, Nano token costs. Batch API 50% off, embeddings, tools, multimodal, and cost optimisation for developers.

Read next: OpenAI API Pricing Explained (2026): Cost per Token & Real Examples

Related Resources

Article
AI API Pricing Comparison (2026)

Compare OpenAI, Claude, and Gemini token costs and real usage examples.

Read more →
Guide
The Complete Guide to AI Workflow Automation

Learn how to identify automation opportunities, choose the right tools, and measure ROI.

Read more →
Consulting
AI Customer Support Automation

See how AI automation can resolve 70% of customer support tickets automatically.

Read more →
Case Study
AI Workflow Automation Case Study

Real-world example of AI automation delivering measurable results.

Read more →