Gemini API Pricing Explained: Token Costs and Free Tier

Q: Is Gemini usually cheaper than OpenAI for the same chatbot volume?

For a 100k messages/month pattern with 75% input / 25% output at 1k tokens per message, Gemini 2.5 Flash-Lite is often the lowest headline token cost in Google’s table, while OpenAI’s GPT-5 Mini and Nano tiers compete in the mid/low range. The cheapest option still depends on grounding, thinking tokens, and caching—model the exact mix in this article’s tables.

Q: Does Google Search grounding add fees on top of Gemini tokens?

Yes. Grounding with Google Search is billed separately from raw input/output tokens once you exceed free quotas, and pricing differs between Gemini 2.5 and Gemini 3 tracks. Treat search-assisted answers as tokens plus per-query surcharges when you forecast spend.

Q: Does Gemini have a free tier?

Yes. Many Gemini models offer 'Free of charge' for input/output on the Free Tier, but with rate limits and terms that may allow Google to use content to improve products. For B2C production in the EEA, Switzerland, or UK, terms require Paid Services when exposing API clients to end users.

Q: What are Gemini thinking tokens?

For models with extended thinking (chain-of-thought), output pricing includes the model's internal reasoning tokens — not just the final response. This raises output costs. Verify whether your model variant bills thinking as output before scaling.

Q: How does Gemini Flash-Lite compare to Flash?

Flash-Lite is ~3x cheaper on input ($0.10 vs $0.30 per 1M) and ~6x on output ($0.40 vs $2.50 per 1M). It's optimised for simple, high-volume tasks. Flash offers better capability for complex reasoning. At 100k messages/month (75/25 mix), Flash-Lite ~$17.50 vs Flash ~$85.

Q: What is Gemini Batch API pricing?

The Batch API charges ~50% of standard token rates for async jobs with up to 24h turnaround. Ideal for embeddings, tagging, and offline evaluations. Batch embedding with gemini-embedding-001 is $0.075/1M tokens vs $0.15 Standard.

Q: When do Gemini rate limits reset?

RPD (requests per day) resets at midnight Pacific. Limits are per project, not per API key. RPM and TPM vary by model and tier. Check AI Studio for your project's current limits.

03/03/2026Nicola Lazzari

Gemini API pricing comparison showing Flash and Flash-Lite token costs

Executive Summary

The Gemini API (Google AI for Developers) bills primarily by token: you pay for input tokens, output tokens (including thinking tokens where applicable), cached tokens, and cache storage duration. In 2026 the official pricing shows a wide gap between "Lite/Flash" and "Pro" models: for high-volume chat workloads, cheaper models drastically reduce unit cost; Pro models remain more expensive but useful when quality, agentic capability, and reasoning matter.

The Free Tier exists for many models (often "Free of charge" for input/output), but it is subject to rate limits and terms of use. Content sent via unpaid services may be used to improve Google products. For apps serving users in the EEA, Switzerland, or UK, terms require Paid Services only when exposing API clients to end users — in practice, for B2C production in Europe, treat the Paid Tier as a requirement.

Two structural cost levers from official docs:

Batch API: Async processing at ~50% of standard cost, with SLO up to 24h — ideal for embeddings, tagging, offline evaluations.
Context caching: Reduces spend when you repeat large prefixes (instructions, documents, video) by paying reduced cached-token rates plus storage per hour.

How Gemini API Pricing Works

Gemini bills for:

Input and output tokens (output explicitly includes thinking tokens where indicated)
Cached tokens and cache storage time (TTL)
Batch API at roughly 50% of standard cost
Tooling: e.g. Grounding with Google Search has its own pricing (different for Gemini 3 vs 2.5)
Video generation with Veo is billed per second

Token Pricing Table (Paid Tier)

Official list prices (USD per 1M tokens). For models with tiered pricing (e.g. ≤200k vs >200k prompt), the ≤200k rates apply to typical chat scenarios.

Model	Input (USD/1M)	Output (USD/1M)	Cached (USD/1M)	Cache Storage (USD/1M tok/hr)	Free Tier
Gemini 2.5 Flash-Lite	$0.10	$0.40	$0.01	$1.00	Yes
Gemini 2.5 Flash	$0.30	$2.50	$0.03	$1.00	Yes
Gemini 2.5 Pro	$1.25	$10.00	$0.125	$4.50	Yes
Gemini 3.1 Flash-Lite Preview	$0.25	$1.50	$0.025	$1.00	Yes
Gemini 3 Flash Preview	$0.50	$3.00	$0.05	$1.00	Yes
Gemini 3.1 Pro Preview	$2.00	$12.00	$0.20	$4.50	No

Pro models have higher rates for prompts >200k tokens. Check the latest pricing page for your chosen model.

Batch API: 50% Cost Reduction

The Batch API is designed for high-volume async workloads at ~50% of standard cost with a 24h turnaround target. The pricing page shows Batch rates typically halved (e.g. Gemini 2.5 Flash: input $0.15 vs $0.30; output $1.25 vs $2.50).

Use Batch for embeddings, tagging, evaluations, and any workload that doesn't need real-time latency.

Embeddings

gemini-embedding-001 is available on Free and Paid tiers. On Paid tier:

Mode	Price (USD/1M token)	1M docs × 512 tok
Standard	$0.15	$76.80
Batch	$0.075	$38.40

For File Search: you pay embedding at indexing time; query-time embedding is free. You then pay model tokens when retrieved chunks are included in the prompt.

Model Families: 2.5 vs 3

Gemini 2.5 is stable and cost-competitive. Gemini 3 adds more agentic support and features (e.g. thinking levels, finer media resolution) but at higher token cost.

Pro (2.5 Pro, 3.1 Pro): Higher cost per token; use when prompts need complex reasoning or coding. Pro models may have higher rates for prompts >200k tokens.
Flash: Balanced choice for chatbots and general-purpose workflows with moderate cost and good multimodal capability.
Flash-Lite: Maximises cost/volume for frequent, simple tasks (extraction, classification, light agentic). Gemini 3 Flash-Lite is in preview and targets high volume.

Preview models can have stricter rate limits and are deprecated with notice (typically ≥2 weeks). Check release notes for migration timelines.

Multimodal and media_resolution

media_resolution controls tokens allocated to images, video, and PDF — balancing quality vs cost and latency. Estimated tokens per image:

Gemini 2.5: LOW=64, MEDIUM=256; HIGH may include pan & scan (variable)
Gemini 3: LOW=280, MEDIUM=560, HIGH=1120

Best practice: use HIGH for difficult visual tasks (OCR, small text); MEDIUM for PDFs; LOW/MEDIUM for generic video frames.

Real Cost Examples

Chatbot: 100k messages/month, 1k token/msg, 75% input / 25% output

75M input + 25M output tokens monthly.

Model	Monthly Cost	Per message
Gemini 2.5 Flash-Lite	$17.50	$0.000175
Gemini 2.5 Flash	$85.00	$0.000850
Gemini 2.5 Pro	$343.75	$0.00344
Gemini 3.1 Flash-Lite Preview	$56.25	$0.00056
Gemini 3 Flash Preview	$112.50	$0.00113
Gemini 3.1 Pro Preview	$450.00	$0.00450

Embedding pipeline: 1M documents × 512 tokens

512M tokens. Standard: $76.80. Batch: $38.40.

Multimodal: 10k images/month (MEDIUM resolution)

50 text input + 200 output per image. Image tokens: 256 (2.5) or 560 (3).

Model	Monthly Cost
Gemini 2.5 Flash-Lite	$1.11
Gemini 2.5 Flash	$5.92
Gemini 2.5 Pro	$23.82
Gemini 3.1 Flash-Lite Preview	$4.53
Gemini 3 Flash Preview	$9.05
Gemini 3.1 Pro Preview	$36.20

Sensitivity: Token Volume and Output Share

At 100k messages/month, cost varies with tokens per message and output share:

	500 tok/msg	1000 tok/msg	2000 tok/msg
2.5 Flash-Lite (25% out)	$8.75	$17.50	$35.00
2.5 Flash (25% out)	$42.50	$85.00	$170.00
2.5 Pro (25% out)	$171.88	$343.75	$687.50

Shifting output from 15% to 40% increases cost significantly — output is priced higher than input.

Hidden Costs and Limits

Context caching storage: Caching has two cost components: cached tokens (reduced rate) and storage per hour. A 100k-token cache with 24h TTL: Flash storage ~$2.40, Pro storage ~$10.80. Use short TTLs when possible.

Grounding with Google Search: Gemini 3 bills per search query; Gemini 2.5 bills per grounded prompt. After free quotas: Gemini 3 ~$14/1k queries; Gemini 2.5 ~$35/1k prompts. Enable grounding only when freshness matters; prefer File Search for proprietary knowledge.

Rate limits: RPM, TPM, RPD apply per project (not per API key). RPD resets at midnight Pacific. Limits vary by tier and model — check AI Studio for your project.

Batch limits: 100 concurrent batches; 2GB max input file; 20GB storage. Enqueued token limits vary by model and tier.

How to Reduce Gemini API Costs

Model routing: Use Flash-Lite/Flash as default; route to Pro only for low-confidence, parsing failures, or hard tasks. Price gaps can be ~20× between Lite and Pro on output.
Batch for non-interactive workloads: Embeddings, tagging, evaluations — Batch halves cost and has dedicated limits.
Context caching with short TTL: Cache only high-reuse prefixes (policies, manuals) with minimal useful TTL; measure cache hits.
media_resolution as cost lever: Use MEDIUM or LOW unless you need OCR or fine detail; reserve HIGH for diagrams and text-heavy images.
Grounding only when needed: On Gemini 3 you pay per query; prompts that trigger 2–3 queries add up. Use File Search for internal knowledge.

For a side-by-side comparison with OpenAI and Claude, see AI API Pricing Comparison (2026).

Planning Your LLM API Budget?

Choosing the right model and optimising token usage can significantly reduce costs. If you need help sizing your API spend or designing cost-efficient AI workflows, let's discuss your requirements.

Book a Free Strategy Call

Cite this article

Title: Gemini API Pricing Explained: Token Costs and Free Tier
Author: Nicola Lazzari
Published: March 3, 2026
Updated: March 2026
URL: https://nicolalazzari.ai/articles/gemini-api-pricing-explained-2026
Website: nicolalazzari.ai
Suggested citation: Nicola Lazzari. Gemini API Pricing Explained: Token Costs and Free Tier. nicolalazzari.ai, updated March 2026.

Sources used

Primary sources

Gemini API pricing: https://ai.google.dev/pricing
Gemini API docs: https://ai.google.dev/gemini-api/docs

AI-Readable Summary

Gemini API pricing 2026: Flash-Lite, Flash, Pro token costs. Batch API 50% off, context caching, embeddings, multimodal, free tier caveats, and cost optimisation for developers.
Key implementation notes, trade-offs, and optimization opportunities.
Includes context and updates relevant to March 2026.

Key takeaway: Gemini API pricing 2026: Flash-Lite, Flash, Pro token costs. Batch API 50% off, context caching, embeddings, multimodal, free tier caveats, and cost optimisation for developers.

Updated

March 2026

Topic

Gemini API Pricing Explained: Token Costs and Free Tier

Audience

Developers, founders, product teams

This article may be referenced in research, documentation, or AI datasets. Please cite the original source when possible.

faq

Frequently asked questions.

Is Gemini usually cheaper than OpenAI for the same chatbot volume?

Does Google Search grounding add fees on top of Gemini tokens?

Does Gemini have a free tier?

What are Gemini thinking tokens?

How does Gemini Flash-Lite compare to Flash?

What is Gemini Batch API pricing?

When do Gemini rate limits reset?

Gemini API Pricing Explained: Token Costs and Free Tier

Executive Summary

How Gemini API Pricing Works

Token Pricing Table (Paid Tier)

Batch API: 50% Cost Reduction

Embeddings

Model Families: 2.5 vs 3

Multimodal and media_resolution

Real Cost Examples

Chatbot: 100k messages/month, 1k token/msg, 75% input / 25% output

Embedding pipeline: 1M documents × 512 tokens

Multimodal: 10k images/month (MEDIUM resolution)

Sensitivity: Token Volume and Output Share

Hidden Costs and Limits

How to Reduce Gemini API Costs

Related

Planning Your LLM API Budget?

Cite this article

Sources used

AI-Readable Summary

Frequently asked questions.

Related reading

Related Resources