Google Gemini Prompting Best Practices
by Google
Add to your library first to use in Claude Code
About
Comprehensive guide to prompting best practices for Google Gemini models including Gemini 3.1 Pro, 3 Pro, 3 Flash, 2.5 Pro, 2.5 Flash, and 2.5 Flash-Lite. Covers universal techniques, model-specific tips, thinking mode, function calling, multimodal capabilities, grounding, caching, and agentic patterns sourced from official Google documentation.
Preview
Chapter 1: Introduction & Model Overview
Google Gemini is a family of multimodal large language models developed by Google DeepMind. Designed to understand and generate across text, images, audio, video, and code, the Gemini model family offers a range of options tailored to different performance, latency, and cost requirements. This chapter provides a comprehensive overview of the current model lineup, their capabilities, pricing, and guidance on choosing the right model for your use case.
Current Gemini Model Lineup
The Gemini model family is organized into three active series, each representing a generation of capability improvements. Understanding which series and model to use is the first decision you need to make when building with Gemini.
Gemini 3 Series (Preview)
The Gemini 3 series represents the latest generation of models, currently available in preview. These models introduce simplified thinking controls, improved conciseness, and enhanced multimodal capabilities.
- Gemini 3.1 Pro -- The most capable model in the 3 series, offering top-tier reasoning and generation quality. API ID:
gemini-3.1-pro-preview. - Gemini 3 Pro -- A highly capable reasoning model balancing quality and efficiency. API ID:
gemini-3-pro-preview. - Gemini 3 Flash -- Optimized for speed and cost efficiency while maintaining strong reasoning. API ID:
gemini-3-flash-preview. - Gemini 3 Pro Image (Nano Banana Pro) -- Specialized for professional-grade image generation with 4K output, advanced text rendering, and search grounding. API ID:
gemini-3-pro-image-preview.
All Gemini 3 models share a 1 million token context window, a 64K maximum output, and a knowledge cutoff of January 2025. They introduce the new thinkingLevel parameter (replacing thinkingBudget) and default to more concise responses compared to previous generations.
Gemini 2.5 Series (Generally Available)
The Gemini 2.5 series is the current generally available (GA) production-ready lineup. These models offer strong reasoning with configurable thinking budgets and are suitable for production deployments.
- Gemini 2.5 Pro -- The most advanced GA reasoning model for complex tasks. API ID:
gemini-2.5-pro. Input pricing: $1.25 per MTok (up to 200K context) / $2.50 per MTok (over 200K context). Output pricing: $10 per MTok (up to 200K) / $15 per MTok (over 200K). Thinking budget range: 128 to 32,768 tokens. - Gemini 2.5 Flash -- The best price-performance reasoning model. API ID:
gemini-2.5-flash. Input pricing: $0.30 per MTok. Output pricing: $2.50 per MTok. Thinking budget range: 0 to 24,576 tokens (can be fully disabled). - Gemini 2.5 Flash-Lite -- The most cost-efficient model for high-volume workloads. API ID:
gemini-2.5-flash-lite. Input pricing: $0.10 per MTok. Output pricing: $0.40 per MTok. No thinking support.
Gemini 2.0 (Deprecated)
- Gemini 2.0 Flash -- Previously the workhorse model, now deprecated in favor of the 2.5 series. Existing integrations should plan migration to Gemini 2.5 Flash or Flash-Lite.
Model Comparison Table
| Model | API ID | Status | Context Window | Max Output | Thinking | Input Price (per MTok) | Output Price (per MTok) | Best For |
|---|---|---|---|---|---|---|---|---|
| 3.1 Pro | gemini-3.1-pro-preview | Preview | 1M | 64K | thinkingLevel | $2.00 | $12.00 | Most demanding reasoning tasks |
| 3 Pro | gemini-3-pro-preview | Preview | 1M | 64K | thinkingLevel | $2.00 | $12.00 | Complex reasoning and generation |
| 3 Flash | gemini-3-flash-preview | Preview | 1M | 64K | thinkingLevel | $0.50 | $3.00 | Fast, cost-effective reasoning |
| 3 Pro Image | gemini-3-pro-image-preview | Preview | 1M | 64K | Yes | -- | -- | Professional image generation |
| 2.5 Pro | gemini-2.5-pro | GA | 1M | 64K | 128-32768 | $1.25 / $2.50 | $10.00 / $15.00 | Complex coding, math, analysis |
| 2.5 Flash | gemini-2.5-flash | GA | 1M | 64K | 0-24576 | $0.30 | $2.50 | High-volume reasoning tasks |
| 2.5 Flash-Lite | gemini-2.5-flash-lite | GA | 1M | 64K | None | $0.10 | $0.40 | Classification, extraction, high-volume |
| 2.0 Flash | gemini-2.0-flash | Deprecated | 1M | 64K | None | -- | -- | Legacy workloads (migrate away) |
Platform Availability
Gemini models are accessible through three primary platforms, each serving different audiences and use cases.
Gemini Developer API
The Gemini Developer API is the most direct way to access Gemini models. It provides RESTful endpoints and official SDKs in Python, JavaScript/TypeScript, Go, and Java. This is the recommended starting point for individual developers, startups, and rapid prototyping.
Google AI Studio
Google AI Studio is a web-based IDE for experimenting with Gemini models. It provides a visual interface for testing prompts, adjusting parameters, and comparing model outputs without writing code. AI Studio is excellent for prompt development, few-shot example curation, and quick experimentation before committing to code.
Vertex AI
Vertex AI is Google Cloud's enterprise ML platform. It offers the same Gemini models but adds enterprise features including VPC security, data residency controls, IAM-based access management, model monitoring, batch prediction, and SLA-backed uptime guarantees. Use Vertex AI for production deployments in regulated industries or organizations with strict compliance requirements.
Specialized Variants
Beyond the core text-and-reasoning models, Google offers several specialized Gemini variants designed for specific tasks.
| Variant | Purpose |
|---|---|
| Flash Image / Nano Banana | Fast image generation optimized for high-volume, low-latency workflows |
| Nano Banana Pro | Professional 4K image generation with text rendering and search grounding |
| Live | Real-time voice and video streaming over WebSocket connections |
| TTS (Text-to-Speech) | High-quality speech synthesis with multiple voice options |
| Embeddings | Dense vector representations for semantic search and retrieval |
| Deep Research | Extended multi-step research workflows with automatic source gathering |
| Computer Use | Browser automation through screenshots and UI action commands |
| Robotics | Physical world interaction and robotic control planning |
Decision Framework for Choosing the Right Model
Selecting the right Gemini model depends on four key dimensions: task complexity, latency requirements, cost sensitivity, and feature needs.
By Task Complexity
- Simple tasks (classification, extraction, formatting): Use 2.5 Flash-Lite for maximum cost efficiency, or 3 Flash with
thinkingLevel: minimalif you need the latest capabilities. - Moderate tasks (summarization, Q&A, content generation): Use 2.5 Flash with dynamic thinking, or 3 Flash with
thinkingLevel: medium. - Complex tasks (multi-step reasoning, code generation, mathematical proofs): Use 2.5 Pro or 3 Pro with high thinking budgets.
- Most demanding tasks (competition math, novel algorithm design, deep analysis): Use 2.5 Pro with Deep Think (budget 32768) or 3.1 Pro with
thinkingLevel: high.
By Latency Requirements
- Real-time / sub-second: Use the Live API variants for streaming, or 2.5 Flash-Lite for fastest batch responses.
- Interactive (1-5 seconds): Use 2.5 Flash or 3 Flash with moderate thinking.
- Batch / async: Use any model; consider batch API pricing discounts for non-urgent workloads.
By Cost Sensitivity
- Budget-constrained: Start with 2.5 Flash-Lite ($0.10/$0.40 per MTok) and only upgrade if quality is insufficient.
- Balanced: Use 2.5 Flash ($0.30/$2.50 per MTok) for the best quality-per-dollar with reasoning.
- Quality-first: Use 2.5 Pro or 3.1 Pro when output quality justifies the cost.
Free vs Paid vs Enterprise Tier Comparison
| Feature | Free Tier | Paid Tier | Enterprise (Vertex AI) |
|---|---|---|---|
| Rate limits | 15 RPM / 1,500 RPD | 2,000+ RPM | Custom / negotiated |
| Context caching | Limited | Full access | Full access + SLA |
| Batch API | Not available | 50% discount pricing | Full access |
| Grounding (Search) | 1,500 RPD free | $14-$35 per 1K queries | Volume pricing |
| Support | Community | Standard | Premium / dedicated |
| Data residency | No control | Limited | Full control |
| SLA | None | 99.9% | 99.95%+ |
| Compliance | Basic | SOC 2 | HIPAA, FedRAMP, etc. |
The free tier is suitable for development and testing. The paid tier unlocks production-level rate limits and features. Enterprise (Vertex AI) adds the security, compliance, and operational controls required by large organizations.