Gemini API Pricing Explained: Token Costs and Free Tier

Gemini Pricing at a Glance
Google's Gemini API offers competitive pricing, with some models showing different rates for prompts under vs over 200k tokens. Output pricing can include thinking tokens for models that use chain-of-thought — a gotcha to watch.
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Gemini Flash | $0.30 | $2.50 |
| Gemini Flash-Lite | $0.10 | $0.40 |
Some models have tiered pricing based on prompt size (e.g. ≤200k vs >200k tokens). Check the latest pricing page for your chosen model.
Thinking Tokens: The Hidden Cost
For models that use extended thinking (chain-of-thought reasoning), output pricing can include thinking tokens. That means the model's internal reasoning is billed as output — not just the final response. If you're using a thinking model, expect higher output costs than the base rate suggests.
Always verify whether your model variant includes thinking in output billing before scaling up.
Rate Limits and Quotas
Rate limits are per project. Quotas reset at midnight Pacific. Limits vary by model — RPD (requests per day) and RPM (requests per minute) can surprise you at scale.
Google's rate limits documentation is essential reading. Plan for quota headroom and consider the free tier for development and low-volume production.
Real Workload Example
Same scenario: 100,000 chats/month, 1,000 input and 250 output tokens per chat (100M in + 25M out).
| Model | Calculation | Monthly Cost |
|---|---|---|
| Gemini Flash-Lite | (100 × $0.10) + (25 × $0.40) | $20 |
| Gemini Flash | (100 × $0.30) + (25 × $2.50) | $92.50 |
Flash-Lite: Shockingly Cheap — With Tradeoffs
At $20/month for 100k chats, Gemini Flash-Lite is the cheapest option in this comparison. But capability tradeoffs apply:
- Simpler tasks only: Flash-Lite is optimised for high-volume, low-complexity workloads. Don't expect the same reasoning quality as Flash or larger models.
- Context and rate limits: Check context window limits and RPD/RPM for Flash-Lite. At scale, you may hit quotas before you hit cost limits.
- Use case fit: Ideal for classification, extraction, short Q&A, and routing. For complex reasoning or long-form generation, step up to Flash or a flagship model.
If your workload fits, Flash-Lite can dramatically reduce API spend. Just size your expectations and monitor rate limits.
Planning Your LLM API Budget?
Choosing the right model and optimising token usage can significantly reduce costs. If you need help sizing your API spend or designing cost-efficient AI workflows, let's discuss your requirements.
Book a Free Strategy CallFrequently Asked Questions
Related reading
OpenAI API pricing 2026: GPT-5.2, mini, and pro token costs. Real workload examples, Batch API 50% off, and cost optimisation tips for developers.
Read next: OpenAI API Pricing Explained (2026): Cost per Token & Real Examples →Related Resources
Compare OpenAI, Claude, and Gemini token costs and real usage examples.
Read more →Learn how to identify automation opportunities, choose the right tools, and measure ROI.
Read more →See how AI automation can resolve 70% of customer support tickets automatically.
Read more →Real-world example of AI automation delivering measurable results.
Read more →