Google Gemini Prompting Best Practices

Name: Google Gemini Prompting Best Practices
Brand: Google
Availability: InStock

by Google

best-practicesagentsllmsprompting-guide

Add to Library

Free · Live updates included

41 readers subscribed

Add to your library first to use in Claude Code

Pages138

Formatmarkdown

ListedFebruary 22, 2026

UpdatedFebruary 22, 2026

Subscribers41

About

Comprehensive guide to prompting best practices for Google Gemini models including Gemini 3.1 Pro, 3 Pro, 3 Flash, 2.5 Pro, 2.5 Flash, and 2.5 Flash-Lite. Covers universal techniques, model-specific tips, thinking mode, function calling, multimodal capabilities, grounding, caching, and agentic patterns sourced from official Google documentation.

138Chapters

443Topics

138Pages

Preview

Chapter 1: Introduction & Model Overview

Google Gemini is a family of multimodal large language models developed by Google DeepMind. Designed to understand and generate across text, images, audio, video, and code, the Gemini model family offers a range of options tailored to different performance, latency, and cost requirements. This chapter provides a comprehensive overview of the current model lineup, their capabilities, pricing, and guidance on choosing the right model for your use case.

Current Gemini Model Lineup

The Gemini model family is organized into three active series, each representing a generation of capability improvements. Understanding which series and model to use is the first decision you need to make when building with Gemini.

Gemini 3 Series (Preview)

The Gemini 3 series represents the latest generation of models, currently available in preview. These models introduce simplified thinking controls, improved conciseness, and enhanced multimodal capabilities.

Gemini 3.1 Pro -- The most capable model in the 3 series, offering top-tier reasoning and generation quality. API ID: gemini-3.1-pro-preview.
Gemini 3 Pro -- A highly capable reasoning model balancing quality and efficiency. API ID: gemini-3-pro-preview.
Gemini 3 Flash -- Optimized for speed and cost efficiency while maintaining strong reasoning. API ID: gemini-3-flash-preview.
Gemini 3 Pro Image (Nano Banana Pro) -- Specialized for professional-grade image generation with 4K output, advanced text rendering, and search grounding. API ID: gemini-3-pro-image-preview.

All Gemini 3 models share a 1 million token context window, a 64K maximum output, and a knowledge cutoff of January 2025. They introduce the new thinkingLevel parameter (replacing thinkingBudget) and default to more concise responses compared to previous generations.

Gemini 2.5 Series (Generally Available)

The Gemini 2.5 series is the current generally available (GA) production-ready lineup. These models offer strong reasoning with configurable thinking budgets and are suitable for production deployments.

Gemini 2.5 Pro -- The most advanced GA reasoning model for complex tasks. API ID: gemini-2.5-pro. Input pricing: $1.25 per MTok (up to 200K context) / $2.50 per MTok (over 200K context). Output pricing: $10 per MTok (up to 200K) / $15 per MTok (over 200K). Thinking budget range: 128 to 32,768 tokens.
Gemini 2.5 Flash -- The best price-performance reasoning model. API ID: gemini-2.5-flash. Input pricing: $0.30 per MTok. Output pricing: $2.50 per MTok. Thinking budget range: 0 to 24,576 tokens (can be fully disabled).
Gemini 2.5 Flash-Lite -- The most cost-efficient model for high-volume workloads. API ID: gemini-2.5-flash-lite. Input pricing: $0.10 per MTok. Output pricing: $0.40 per MTok. No thinking support.

Gemini 2.0 (Deprecated)

Gemini 2.0 Flash -- Previously the workhorse model, now deprecated in favor of the 2.5 series. Existing integrations should plan migration to Gemini 2.5 Flash or Flash-Lite.

Model Comparison Table

Model	API ID	Status	Context Window	Max Output	Thinking	Input Price (per MTok)	Output Price (per MTok)	Best For
3.1 Pro	`gemini-3.1-pro-preview`	Preview	1M	64K	thinkingLevel	$2.00	$12.00	Most demanding reasoning tasks
3 Pro	`gemini-3-pro-preview`	Preview	1M	64K	thinkingLevel	$2.00	$12.00	Complex reasoning and generation
3 Flash	`gemini-3-flash-preview`	Preview	1M	64K	thinkingLevel	$0.50	$3.00	Fast, cost-effective reasoning
3 Pro Image	`gemini-3-pro-image-preview`	Preview	1M	64K	Yes	--	--	Professional image generation
2.5 Pro	`gemini-2.5-pro`	GA	1M	64K	128-32768	$1.25 / $2.50	$10.00 / $15.00	Complex coding, math, analysis
2.5 Flash	`gemini-2.5-flash`	GA	1M	64K	0-24576	$0.30	$2.50	High-volume reasoning tasks
2.5 Flash-Lite	`gemini-2.5-flash-lite`	GA	1M	64K	None	$0.10	$0.40	Classification, extraction, high-volume
2.0 Flash	`gemini-2.0-flash`	Deprecated	1M	64K	None	--	--	Legacy workloads (migrate away)

Platform Availability

Gemini models are accessible through three primary platforms, each serving different audiences and use cases.

Gemini Developer API

The Gemini Developer API is the most direct way to access Gemini models. It provides RESTful endpoints and official SDKs in Python, JavaScript/TypeScript, Go, and Java. This is the recommended starting point for individual developers, startups, and rapid prototyping.

Google AI Studio

Google AI Studio is a web-based IDE for experimenting with Gemini models. It provides a visual interface for testing prompts, adjusting parameters, and comparing model outputs without writing code. AI Studio is excellent for prompt development, few-shot example curation, and quick experimentation before committing to code.

Vertex AI

Vertex AI is Google Cloud's enterprise ML platform. It offers the same Gemini models but adds enterprise features including VPC security, data residency controls, IAM-based access management, model monitoring, batch prediction, and SLA-backed uptime guarantees. Use Vertex AI for production deployments in regulated industries or organizations with strict compliance requirements.

Specialized Variants

Beyond the core text-and-reasoning models, Google offers several specialized Gemini variants designed for specific tasks.

Variant	Purpose
Flash Image / Nano Banana	Fast image generation optimized for high-volume, low-latency workflows
Nano Banana Pro	Professional 4K image generation with text rendering and search grounding
Live	Real-time voice and video streaming over WebSocket connections
TTS (Text-to-Speech)	High-quality speech synthesis with multiple voice options
Embeddings	Dense vector representations for semantic search and retrieval
Deep Research	Extended multi-step research workflows with automatic source gathering
Computer Use	Browser automation through screenshots and UI action commands
Robotics	Physical world interaction and robotic control planning

Decision Framework for Choosing the Right Model

Selecting the right Gemini model depends on four key dimensions: task complexity, latency requirements, cost sensitivity, and feature needs.

By Task Complexity

Simple tasks (classification, extraction, formatting): Use 2.5 Flash-Lite for maximum cost efficiency, or 3 Flash with thinkingLevel: minimal if you need the latest capabilities.
Moderate tasks (summarization, Q&A, content generation): Use 2.5 Flash with dynamic thinking, or 3 Flash with thinkingLevel: medium.
Complex tasks (multi-step reasoning, code generation, mathematical proofs): Use 2.5 Pro or 3 Pro with high thinking budgets.
Most demanding tasks (competition math, novel algorithm design, deep analysis): Use 2.5 Pro with Deep Think (budget 32768) or 3.1 Pro with thinkingLevel: high.

By Latency Requirements

Real-time / sub-second: Use the Live API variants for streaming, or 2.5 Flash-Lite for fastest batch responses.
Interactive (1-5 seconds): Use 2.5 Flash or 3 Flash with moderate thinking.
Batch / async: Use any model; consider batch API pricing discounts for non-urgent workloads.

By Cost Sensitivity

Budget-constrained: Start with 2.5 Flash-Lite ($0.10/$0.40 per MTok) and only upgrade if quality is insufficient.
Balanced: Use 2.5 Flash ($0.30/$2.50 per MTok) for the best quality-per-dollar with reasoning.
Quality-first: Use 2.5 Pro or 3.1 Pro when output quality justifies the cost.

Free vs Paid vs Enterprise Tier Comparison

Feature	Free Tier	Paid Tier	Enterprise (Vertex AI)
Rate limits	15 RPM / 1,500 RPD	2,000+ RPM	Custom / negotiated
Context caching	Limited	Full access	Full access + SLA
Batch API	Not available	50% discount pricing	Full access
Grounding (Search)	1,500 RPD free	$14-$35 per 1K queries	Volume pricing
Support	Community	Standard	Premium / dedicated
Data residency	No control	Limited	Full control
SLA	None	99.9%	99.95%+
Compliance	Basic	SOC 2	HIPAA, FedRAMP, etc.

The free tier is suitable for development and testing. The paid tier unlocks production-level rate limits and features. Enterprise (Vertex AI) adds the security, compliance, and operational controls required by large organizations.

Add to library to read more

Batch API example

Fast classification -- no reasoning needed

Comparing two approaches -- moderate reasoning

When importing external conversation history

Deep Think for a hard optimization problem

Build the function response

Send the full conversation back to the model

"The weather in Tokyo is currently 22 degrees Celsius and partly cloudy

The response contains multiple function call parts

Output:

Call: get_weather({"location": "New York, NY"})

Call: get_weather({"location": "London, UK"})

Extract all function calls from the response

Build all function responses

Send all results back in a single turn

"Here's the current weather:

- New York: 18°C, sunny

- London: 12°C, overcast

Model returns: get_current_location()

Returns: {"city": "San Francisco", "state": "CA"}

Model returns: get_weather({"location": "San Francisco, CA"})

Pass functions directly -- the SDK handles everything

Response includes current news with source citations

Upload a file first

Connect to an MCP server

List available tools from the MCP server

The SDK handles everything automatically

Bad: Unclear what this does

Bad: Free-form string invites errors

For older Gemini models or when you need strict determinism

This will NOT work with Gemini 2.5 models

Instead, use EITHER custom functions:

Correct: Preserve the model's response exactly as received

Send it back unchanged in the conversation history

INCORRECT: Reconstructing function calls loses the thought signature

Do NOT do this:

Using the SDK's built-in chat -- signatures are handled automatically

SDK automatically handles function execution, signatures, and response cycles

Upload the file once

Correct ordering: image first, then text

Upload video via File API

Wait for processing to complete

Example: Processing a long video at low resolution for general summary

Less effective

Server-side: Generate an ephemeral token

Send token.token to the client application

Turn 1: Generate the initial image

Turn 2: Add elements

Turn 3: Modify the lighting

Access the response text

Step 1: Use Google Search to find relevant sources

Step 2: Extract URLs from grounding metadata, then use URL Context

for deep analysis of the most relevant pages

These two requests share the same system instruction prefix.

The system may automatically cache the shared content.

Request 1

Step 1: Create a cache with a large document

Step 2: Use the cache in subsequent requests

Step 3: Update the TTL if you need more time

List all active caches

LESS EFFECTIVE: Asking for multiple facts in one query

CORRECT: Query after context

Deterministic output (e.g., for classification tasks)

Short, concise answer

Define tools

Create an ADK agent

Stage 1: Research agent

Stage 2: Analysis agent

Stage 3: Writer agent

The model returns actions like:

Cache the agent's reference documentation

Install ADK

Add to Library

Free · Live updates included

41 readers subscribed

Google Gemini Prompting Best Practices

About

Preview

Chapter 1: Introduction & Model Overview

Current Gemini Model Lineup

Gemini 3 Series (Preview)

Gemini 2.5 Series (Generally Available)

Gemini 2.0 (Deprecated)

Model Comparison Table

Platform Availability

Gemini Developer API

Google AI Studio

Vertex AI

Specialized Variants

Decision Framework for Choosing the Right Model

By Task Complexity

By Latency Requirements

By Cost Sensitivity

Free vs Paid vs Enterprise Tier Comparison

Table of Contents

Chapter 1: Introduction & Model Overview15 topics

Chapter 2: Universal Prompting Techniques41 topics

Chapter 3: Gemini 2.5 Pro Best Practices4 topics

Access the thinking process6 topics

Include a full codebase for analysis4 topics

Chapter 4: Gemini 2.5 Flash Best Practices4 topics

Moderate thinking for a reasoning task1 topics

Zero thinking for a simple classification task9 topics

Chapter 5: Gemini 3 Series Best Practices10 topics

Return an image from a function call2 topics

Chapter 6: Thinking Mode Deep Dive3 topics

Complex mathematical proof -- full reasoning3 topics

Dynamic thinking -- recommended default3 topics

Streaming with thoughts4 topics

The SDK handles signature requirements when you use the chat interface6 topics

Chapter 7: Function Calling & Tool Use3 topics

The response contains a function call, not text1 topics

Extract and execute1 topics

with 65% humidity."7 topics

Call: get_weather({"location": "Tokyo, Japan"})1 topics

Execute them all at once1 topics

- Tokyo: 22°C, partly cloudy"2 topics

Define both function declarations1 topics

"You're currently in San Francisco, CA. The weather is 18°C and foggy."2 topics

The response is already the final text -- no manual function execution needed4 topics

Access grounding metadata for source attribution3 topics

Use file search to query across uploaded documents4 topics

Parse the response directly into a Pydantic model4 topics

Use MCP tools with Gemini1 topics

Final answer incorporating data from the MCP tool server3 topics

Good: Clear purpose and behavior1 topics

Good: Constrained enum prevents invalid values2 topics

For Gemini 2.5 models -- default temperature often works best4 topics

Calculate approximate token usage from declarations1 topics

OR built-in tools:3 topics

This loses the thought signature and will cause a 400 error1 topics

Chapter 8: Multimodal Capabilities3 topics

Reference it in as many requests as needed4 topics

Avoid: text first, then image4 topics

Ask questions about the video13 topics

Example: Processing a code walkthrough video at high resolution2 topics

Save the generated image2 topics

More effective1 topics

Upload reference images3 topics

Client uses this token to connect directly to the Live API5 topics

Turn 4: Style change3 topics

Analyze a video alongside its transcript PDF1 topics

Compare two product images2 topics

Chapter 9: Grounding, Caching & Advanced Features3 topics

Access grounding metadata and citations3 topics

Use URL Context to deeply analyze specific pages7 topics

Request 2 - the shared system instruction may be served from cache1 topics

Step 4: Delete the cache when done1 topics

Get details of a specific cache5 topics

MORE EFFECTIVE: Separate queries for each fact1 topics

INCORRECT: Query before context12 topics

The response is guaranteed to be valid JSON matching the schema7 topics

Access the executed code and its output13 topics

Creative output (e.g., for brainstorming)3 topics

Long-form content4 topics