CandleKeep
Google Cloud

Google Gemini Prompting Best Practices

by Google

best-practicesagentsllmsprompting-guide
Pages138
Formatmarkdown
ListedFebruary 22, 2026
UpdatedFebruary 22, 2026
Subscribers41

About

Comprehensive guide to prompting best practices for Google Gemini models including Gemini 3.1 Pro, 3 Pro, 3 Flash, 2.5 Pro, 2.5 Flash, and 2.5 Flash-Lite. Covers universal techniques, model-specific tips, thinking mode, function calling, multimodal capabilities, grounding, caching, and agentic patterns sourced from official Google documentation.

138Chapters
443Topics
138Pages

Preview

Chapter 1: Introduction & Model Overview

Google Gemini is a family of multimodal large language models developed by Google DeepMind. Designed to understand and generate across text, images, audio, video, and code, the Gemini model family offers a range of options tailored to different performance, latency, and cost requirements. This chapter provides a comprehensive overview of the current model lineup, their capabilities, pricing, and guidance on choosing the right model for your use case.

Current Gemini Model Lineup

The Gemini model family is organized into three active series, each representing a generation of capability improvements. Understanding which series and model to use is the first decision you need to make when building with Gemini.

Gemini 3 Series (Preview)

The Gemini 3 series represents the latest generation of models, currently available in preview. These models introduce simplified thinking controls, improved conciseness, and enhanced multimodal capabilities.

  • Gemini 3.1 Pro -- The most capable model in the 3 series, offering top-tier reasoning and generation quality. API ID: gemini-3.1-pro-preview.
  • Gemini 3 Pro -- A highly capable reasoning model balancing quality and efficiency. API ID: gemini-3-pro-preview.
  • Gemini 3 Flash -- Optimized for speed and cost efficiency while maintaining strong reasoning. API ID: gemini-3-flash-preview.
  • Gemini 3 Pro Image (Nano Banana Pro) -- Specialized for professional-grade image generation with 4K output, advanced text rendering, and search grounding. API ID: gemini-3-pro-image-preview.

All Gemini 3 models share a 1 million token context window, a 64K maximum output, and a knowledge cutoff of January 2025. They introduce the new thinkingLevel parameter (replacing thinkingBudget) and default to more concise responses compared to previous generations.

Gemini 2.5 Series (Generally Available)

The Gemini 2.5 series is the current generally available (GA) production-ready lineup. These models offer strong reasoning with configurable thinking budgets and are suitable for production deployments.

  • Gemini 2.5 Pro -- The most advanced GA reasoning model for complex tasks. API ID: gemini-2.5-pro. Input pricing: $1.25 per MTok (up to 200K context) / $2.50 per MTok (over 200K context). Output pricing: $10 per MTok (up to 200K) / $15 per MTok (over 200K). Thinking budget range: 128 to 32,768 tokens.
  • Gemini 2.5 Flash -- The best price-performance reasoning model. API ID: gemini-2.5-flash. Input pricing: $0.30 per MTok. Output pricing: $2.50 per MTok. Thinking budget range: 0 to 24,576 tokens (can be fully disabled).
  • Gemini 2.5 Flash-Lite -- The most cost-efficient model for high-volume workloads. API ID: gemini-2.5-flash-lite. Input pricing: $0.10 per MTok. Output pricing: $0.40 per MTok. No thinking support.

Gemini 2.0 (Deprecated)

  • Gemini 2.0 Flash -- Previously the workhorse model, now deprecated in favor of the 2.5 series. Existing integrations should plan migration to Gemini 2.5 Flash or Flash-Lite.

Model Comparison Table

ModelAPI IDStatusContext WindowMax OutputThinkingInput Price (per MTok)Output Price (per MTok)Best For
3.1 Progemini-3.1-pro-previewPreview1M64KthinkingLevel$2.00$12.00Most demanding reasoning tasks
3 Progemini-3-pro-previewPreview1M64KthinkingLevel$2.00$12.00Complex reasoning and generation
3 Flashgemini-3-flash-previewPreview1M64KthinkingLevel$0.50$3.00Fast, cost-effective reasoning
3 Pro Imagegemini-3-pro-image-previewPreview1M64KYes----Professional image generation
2.5 Progemini-2.5-proGA1M64K128-32768$1.25 / $2.50$10.00 / $15.00Complex coding, math, analysis
2.5 Flashgemini-2.5-flashGA1M64K0-24576$0.30$2.50High-volume reasoning tasks
2.5 Flash-Litegemini-2.5-flash-liteGA1M64KNone$0.10$0.40Classification, extraction, high-volume
2.0 Flashgemini-2.0-flashDeprecated1M64KNone----Legacy workloads (migrate away)

Platform Availability

Gemini models are accessible through three primary platforms, each serving different audiences and use cases.

Gemini Developer API

The Gemini Developer API is the most direct way to access Gemini models. It provides RESTful endpoints and official SDKs in Python, JavaScript/TypeScript, Go, and Java. This is the recommended starting point for individual developers, startups, and rapid prototyping.

Google AI Studio

Google AI Studio is a web-based IDE for experimenting with Gemini models. It provides a visual interface for testing prompts, adjusting parameters, and comparing model outputs without writing code. AI Studio is excellent for prompt development, few-shot example curation, and quick experimentation before committing to code.

Vertex AI

Vertex AI is Google Cloud's enterprise ML platform. It offers the same Gemini models but adds enterprise features including VPC security, data residency controls, IAM-based access management, model monitoring, batch prediction, and SLA-backed uptime guarantees. Use Vertex AI for production deployments in regulated industries or organizations with strict compliance requirements.

Specialized Variants

Beyond the core text-and-reasoning models, Google offers several specialized Gemini variants designed for specific tasks.

VariantPurpose
Flash Image / Nano BananaFast image generation optimized for high-volume, low-latency workflows
Nano Banana ProProfessional 4K image generation with text rendering and search grounding
LiveReal-time voice and video streaming over WebSocket connections
TTS (Text-to-Speech)High-quality speech synthesis with multiple voice options
EmbeddingsDense vector representations for semantic search and retrieval
Deep ResearchExtended multi-step research workflows with automatic source gathering
Computer UseBrowser automation through screenshots and UI action commands
RoboticsPhysical world interaction and robotic control planning

Decision Framework for Choosing the Right Model

Selecting the right Gemini model depends on four key dimensions: task complexity, latency requirements, cost sensitivity, and feature needs.

By Task Complexity

  • Simple tasks (classification, extraction, formatting): Use 2.5 Flash-Lite for maximum cost efficiency, or 3 Flash with thinkingLevel: minimal if you need the latest capabilities.
  • Moderate tasks (summarization, Q&A, content generation): Use 2.5 Flash with dynamic thinking, or 3 Flash with thinkingLevel: medium.
  • Complex tasks (multi-step reasoning, code generation, mathematical proofs): Use 2.5 Pro or 3 Pro with high thinking budgets.
  • Most demanding tasks (competition math, novel algorithm design, deep analysis): Use 2.5 Pro with Deep Think (budget 32768) or 3.1 Pro with thinkingLevel: high.

By Latency Requirements

  • Real-time / sub-second: Use the Live API variants for streaming, or 2.5 Flash-Lite for fastest batch responses.
  • Interactive (1-5 seconds): Use 2.5 Flash or 3 Flash with moderate thinking.
  • Batch / async: Use any model; consider batch API pricing discounts for non-urgent workloads.

By Cost Sensitivity

  • Budget-constrained: Start with 2.5 Flash-Lite ($0.10/$0.40 per MTok) and only upgrade if quality is insufficient.
  • Balanced: Use 2.5 Flash ($0.30/$2.50 per MTok) for the best quality-per-dollar with reasoning.
  • Quality-first: Use 2.5 Pro or 3.1 Pro when output quality justifies the cost.

Free vs Paid vs Enterprise Tier Comparison

FeatureFree TierPaid TierEnterprise (Vertex AI)
Rate limits15 RPM / 1,500 RPD2,000+ RPMCustom / negotiated
Context cachingLimitedFull accessFull access + SLA
Batch APINot available50% discount pricingFull access
Grounding (Search)1,500 RPD free$14-$35 per 1K queriesVolume pricing
SupportCommunityStandardPremium / dedicated
Data residencyNo controlLimitedFull control
SLANone99.9%99.95%+
ComplianceBasicSOC 2HIPAA, FedRAMP, etc.

The free tier is suitable for development and testing. The paid tier unlocks production-level rate limits and features. Enterprise (Vertex AI) adds the security, compliance, and operational controls required by large organizations.

Add to library to read more

Table of Contents

Batch API example

Fast classification -- no reasoning needed
Comparing two approaches -- moderate reasoning

When importing external conversation history

Deep Think for a hard optimization problem

Build the function response
Send the full conversation back to the model
"The weather in Tokyo is currently 22 degrees Celsius and partly cloudy

The response contains multiple function call parts
Output:
Call: get_weather({"location": "New York, NY"})
Call: get_weather({"location": "London, UK"})

Extract all function calls from the response

Build all function responses
Send all results back in a single turn
"Here's the current weather:
- New York: 18°C, sunny
- London: 12°C, overcast

Model returns: get_current_location()
Returns: {"city": "San Francisco", "state": "CA"}
Model returns: get_weather({"location": "San Francisco, CA"})

Pass functions directly -- the SDK handles everything

Response includes current news with source citations

Upload a file first

Connect to an MCP server
List available tools from the MCP server

The SDK handles everything automatically

Bad: Unclear what this does

Bad: Free-form string invites errors

For older Gemini models or when you need strict determinism

This will NOT work with Gemini 2.5 models
Instead, use EITHER custom functions:

Correct: Preserve the model's response exactly as received
Send it back unchanged in the conversation history
INCORRECT: Reconstructing function calls loses the thought signature
Do NOT do this:

Using the SDK's built-in chat -- signatures are handled automatically
SDK automatically handles function execution, signatures, and response cycles

Upload the file once

Correct ordering: image first, then text

Upload video via File API
Wait for processing to complete

Example: Processing a long video at low resolution for general summary

Less effective

Server-side: Generate an ephemeral token
Send token.token to the client application

Turn 1: Generate the initial image
Turn 2: Add elements
Turn 3: Modify the lighting

Access the response text

Step 1: Use Google Search to find relevant sources
Step 2: Extract URLs from grounding metadata, then use URL Context
for deep analysis of the most relevant pages

These two requests share the same system instruction prefix.
The system may automatically cache the shared content.
Request 1

Step 1: Create a cache with a large document
Step 2: Use the cache in subsequent requests
Step 3: Update the TTL if you need more time

List all active caches

LESS EFFECTIVE: Asking for multiple facts in one query

CORRECT: Query after context

Deterministic output (e.g., for classification tasks)

Short, concise answer

Define tools
Create an ADK agent

Stage 1: Research agent
Stage 2: Analysis agent
Stage 3: Writer agent

The model returns actions like:

Cache the agent's reference documentation

Install ADK

Add to Library

Free · Live updates included

41 readers subscribed