Gemini API Pricing Explained: Token Costs and Free Tier

nicolalazzari
Gemini API pricing comparison showing Flash and Flash-Lite token costs

Gemini Pricing at a Glance

Google's Gemini API offers competitive pricing, with some models showing different rates for prompts under vs over 200k tokens. Output pricing can include thinking tokens for models that use chain-of-thought — a gotcha to watch.

Model Input (per 1M tokens) Output (per 1M tokens)
Gemini Flash $0.30 $2.50
Gemini Flash-Lite $0.10 $0.40

Some models have tiered pricing based on prompt size (e.g. ≤200k vs >200k tokens). Check the latest pricing page for your chosen model.


Thinking Tokens: The Hidden Cost

For models that use extended thinking (chain-of-thought reasoning), output pricing can include thinking tokens. That means the model's internal reasoning is billed as output — not just the final response. If you're using a thinking model, expect higher output costs than the base rate suggests.

Always verify whether your model variant includes thinking in output billing before scaling up.


Rate Limits and Quotas

Rate limits are per project. Quotas reset at midnight Pacific. Limits vary by model — RPD (requests per day) and RPM (requests per minute) can surprise you at scale.

Google's rate limits documentation is essential reading. Plan for quota headroom and consider the free tier for development and low-volume production.


Real Workload Example

Same scenario: 100,000 chats/month, 1,000 input and 250 output tokens per chat (100M in + 25M out).

Model Calculation Monthly Cost
Gemini Flash-Lite (100 × $0.10) + (25 × $0.40) $20
Gemini Flash (100 × $0.30) + (25 × $2.50) $92.50

Flash-Lite: Shockingly Cheap — With Tradeoffs

At $20/month for 100k chats, Gemini Flash-Lite is the cheapest option in this comparison. But capability tradeoffs apply:

  • Simpler tasks only: Flash-Lite is optimised for high-volume, low-complexity workloads. Don't expect the same reasoning quality as Flash or larger models.
  • Context and rate limits: Check context window limits and RPD/RPM for Flash-Lite. At scale, you may hit quotas before you hit cost limits.
  • Use case fit: Ideal for classification, extraction, short Q&A, and routing. For complex reasoning or long-form generation, step up to Flash or a flagship model.

If your workload fits, Flash-Lite can dramatically reduce API spend. Just size your expectations and monitor rate limits.

Planning Your LLM API Budget?

Choosing the right model and optimising token usage can significantly reduce costs. If you need help sizing your API spend or designing cost-efficient AI workflows, let's discuss your requirements.

Book a Free Strategy Call

Frequently Asked Questions

Yes. Google offers a free tier for the Gemini API with limited requests per minute and per day. It's suitable for development and low-volume production. Check the official pricing and rate limits documentation for current free tier limits.
For models that use extended thinking (chain-of-thought), output pricing can include the model's internal reasoning tokens — not just the final response. This means higher output costs than the base rate suggests. Verify whether your model variant bills thinking as output before scaling.
Flash-Lite is ~4x cheaper on input ($0.10 vs $0.30 per 1M) and ~6x cheaper on output ($0.40 vs $2.50 per 1M). It's optimised for simple, high-volume tasks. Flash offers better capability for complex reasoning and long-form generation. Choose based on task complexity.
Quotas typically reset at midnight Pacific time. Rate limits are per project and vary by model. Check Google's rate limits documentation for RPD (requests per day) and RPM (requests per minute) to avoid surprises at scale.

Related reading

OpenAI API pricing 2026: GPT-5.2, mini, and pro token costs. Real workload examples, Batch API 50% off, and cost optimisation tips for developers.

Read next: OpenAI API Pricing Explained (2026): Cost per Token & Real Examples

Related Resources

Article
AI API Pricing Comparison (2026)

Compare OpenAI, Claude, and Gemini token costs and real usage examples.

Read more →
Guide
The Complete Guide to AI Workflow Automation

Learn how to identify automation opportunities, choose the right tools, and measure ROI.

Read more →
Consulting
AI Customer Support Automation

See how AI automation can resolve 70% of customer support tickets automatically.

Read more →
Case Study
AI Workflow Automation Case Study

Real-world example of AI automation delivering measurable results.

Read more →