Effective Context Engineering for AI Agents

Authors: Prithvi Rajasekaran, Ethan Dixon, Carly Ryan, and Jeremy Hadfield (Anthropic Applied AI Team) Contributors: Rafi Ayub, Hannah Moran, Cal Rueb, and Connor Jennings Source: Anthropic Engineering Blog, September 29, 2025

Chapter 1: Introduction -- From Prompt Engineering to Context Engineering

Building with language models has evolved beyond simply perfecting the wording of prompts. The new paradigm is context engineering -- curating and maintaining the optimal set of tokens during LLM inference.

What Is Context?

Context is the set of tokens included when sampling from a large-language model. This includes everything the model can see at inference time: system instructions, tool definitions, message history, retrieved data, and more.

What Is Context Engineering?

Context engineering encompasses the strategies for curating and maintaining the optimal set of information during LLM inference, including all the data that may land in the context window outside of the prompts themselves.

The core mindset is thinking in context -- considering the holistic state available to the LLM at any given time and what potential behaviors that state might yield.

Why the Shift Matters

For discrete, single-turn tasks like classification or text generation, prompt engineering (writing and organizing LLM instructions for optimal outcomes) works well. But for multi-turn agents with extended task horizons, the challenge expands dramatically. An agent running in a loop generates more and more data that could be relevant, requiring cyclical refinement of what enters the context window.

Context engineering is the art and science of curating what will go into the limited context window at each step of an agent's operation.

Chapter 2: Why Context Engineering Matters for Capable Agents

Three fundamental challenges make context engineering essential for building reliable AI agents.

The Context Rot Problem

Research has shown that as the number of tokens in the context window increases, the model's ability to accurately recall information from that context decreases. This phenomenon, called context rot, affects all models to varying degrees. Simply stuffing more information into the context window does not reliably improve performance -- it can actively degrade it.

The Finite Attention Budget

Just as humans have limited working memory capacity, LLMs have an attention budget that is depleted by each new token added to the context. Every token added consumes some of this budget. Context must therefore be treated as a finite resource with diminishing marginal returns -- each additional piece of information needs to justify its cost.

Architectural Constraints of Transformers

The transformer architecture enables every token to attend to every other token across the entire context, creating n-squared pairwise relationships for n tokens. As context expands, models struggle to maintain these relationships effectively. Additionally, models are trained predominantly on shorter sequences, leaving them less experienced with lengthy contexts.

Techniques like position encoding interpolation help extend effective context length, but they create a performance gradient rather than a hard cliff -- performance degrades gradually rather than cutting off suddenly, but the degradation is real and cumulative.

Chapter 3: The Anatomy of Effective Context

The guiding principle of context engineering is finding the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome. This chapter examines each component of effective context.

System Prompts

System prompts should be extremely clear and use simple, direct language. The key challenge is finding the right altitude -- between brittle, over-specified logic and vague, underspecified guidance.

The optimal system prompt is specific enough to guide behavior effectively, yet flexible enough to provide the model with strong heuristics for handling situations not explicitly covered.

Structural best practices:

Organize into distinct sections using XML tags or Markdown headers
Use clear section markers like <background_information>, <instructions>, ## Tool guidance, ## Output description
Strive for the minimal set of information that fully outlines expected behavior
Start with minimal prompts, then add instructions based on observed failure modes

Tools

Tools are a critical part of an agent's context. They must:

Promote efficiency through token-conscious returns and encourage efficient agent behaviors
Be self-contained and robust to error -- tools should handle edge cases gracefully
Be extremely clear about intended use -- ambiguous tool descriptions lead to misuse
Have descriptive and unambiguous parameters -- each parameter's purpose and format should be obvious

A common failure mode is providing bloated tool sets with overlapping functionality. If a human engineer cannot definitively say which tool should be used in a given situation, an AI agent cannot be expected to do better. Tool sets should be curated with the same care as system prompts.

Examples and Few-Shot Prompting

Rather than listing every possible edge case exhaustively, the most effective approach is to curate a set of diverse, canonical examples that effectively portray the expected behavior of the agent. For LLMs, examples are the equivalent of pictures worth a thousand words -- they communicate expected patterns more efficiently than lengthy rule descriptions.

The Unifying Principle

Across all context components -- system prompts, tools, examples, and message history -- the guidance is the same: be thoughtful and keep your context informative, yet tight. Every token should earn its place.

Chapter 4: Context Retrieval and Agentic Search

As agents become more capable, how they find and load information becomes as important as how that information is structured.

Defining Agents

The field converges on a working definition: LLMs autonomously using tools in a loop. As models improve, agent autonomy scales -- smarter models handle more nuanced problems and recover from errors more independently.

Just-In-Time Context Strategies

Rather than pre-processing and loading all potentially relevant data into the context upfront, effective agents maintain lightweight identifiers -- file paths, stored queries, web links, and other pointers -- and dynamically load detailed data via tools at runtime only when needed.

Example from Claude Code: For data analysis tasks, Claude Code writes targeted queries and uses Bash commands to analyze large datasets without loading full data objects into the context window. This mirrors how humans work: we use external organization and indexing systems like file systems, inboxes, and bookmarks to retrieve relevant information on demand rather than memorizing everything.

Metadata as Signal

File paths, naming conventions, timestamps, and other metadata provide important signals that help both humans and agents understand how and when to utilize information. Well-organized metadata enables agents to make better decisions about what to retrieve without loading the full content.

Progressive Disclosure

Autonomous retrieval enables agents to incrementally discover relevant context through exploration, assembling understanding layer by layer while maintaining only what is necessary in working memory. This progressive approach prevents context bloat while allowing thorough investigation.

The Speed-Autonomy Trade-off

Runtime exploration is inherently slower than pre-computed retrieval. Building effective agentic search requires opinionated and thoughtful engineering to ensure LLMs have the right tools and heuristics for effectively navigating their information landscape.

Hybrid Strategies

The most effective approaches balance speed (pre-retrieved data loaded upfront) with autonomy (just-in-time exploration at runtime).

Claude Code as exemplar: CLAUDE.md files load context upfront at the start of each session, while glob and grep tools enable runtime exploration when the agent needs to discover additional information. This hybrid approach provides a strong foundation of context while preserving the ability to explore.

The practical advice: do the simplest thing that works until model capabilities justify more sophisticated approaches. Over-engineering context retrieval systems often introduces more problems than it solves.

Chapter 5: Context Engineering for Long-Horizon Tasks

Long-horizon tasks -- those running for tens of minutes to hours -- require agents to maintain coherence across token counts that may exceed the context window. Three primary techniques address this challenge.

Technique 1: Compaction

Compaction involves taking a conversation nearing the context window limit, summarizing its contents, and reinitiating a new context window with the summary. This distills the contents in a high-fidelity manner, enabling the agent to continue with minimal performance degradation.

How Claude Code implements compaction: The model summarizes critical details while discarding redundant tool outputs or messages. The agent continues with the compressed context plus the five most recently accessed files, ensuring continuity of the most immediately relevant information.

Key considerations for effective compaction:

Avoid aggressive compaction that loses subtle but important context
Maximize recall initially to capture all relevant information in the summary
Iterate to improve precision by eliminating superfluous content over time
Safe compaction targets: Clearing tool calls and results is generally safe, as these tend to be verbose and their essential information can be captured in summaries

Technique 2: Structured Note-Taking (Agentic Memory)

Rather than relying solely on compaction, agents can regularly write notes persisted to memory outside of the context window. These notes are pulled back into the context window at later times when needed.

This approach provides persistent memory with minimal overhead -- the notes themselves are compact, but the information they preserve can be critical.

Examples of agentic memory in practice:

Claude Code maintains to-do lists that track progress across long development sessions
Claude playing Pokemon tracks precise tallies across thousands of game steps, maintains maps of explored areas, and records strategic notes about game state
Development agents maintain NOTES.md files that persist architectural decisions, discovered constraints, and implementation plans

Structured note-taking enables long-horizon strategies that would be impossible when keeping all the information in the LLM's context window alone. By externalizing memory, agents can operate effectively across arbitrarily long task horizons.

Recent platform development: Anthropic launched a memory tool in public beta on the Claude Developer Platform using a file-based system for storing and consulting information outside context windows.

Technique 3: Sub-Agent Architectures

In multi-agent architectures, specialized sub-agents handle focused tasks with clean, dedicated context windows while a main orchestrating agent coordinates the overall effort. Each sub-agent can explore extensively within its domain but returns a condensed, distilled summary of its work -- often just 1,000 to 2,000 tokens.

This creates a clear separation of concerns: the detailed search or analysis context remains isolated within sub-agents, while the lead agent focuses on synthesizing and analyzing the high-level results. The lead agent's context stays clean and focused, avoiding the context rot that would result from loading all the detailed exploration directly.

Results: Multi-agent research systems have shown substantial improvement over single-agent systems on complex research tasks, particularly those requiring parallel exploration of multiple information sources.

Choosing the Right Approach

Each technique suits different scenarios:

Technique	Best For	Key Strength
Compaction	Extensive back-and-forth conversations	Maintains conversational flow and continuity
Note-taking	Iterative development with clear milestones	Preserves critical decisions and state across long horizons
Multi-agent	Complex research and analysis	Parallel exploration with clean context separation

These approaches are not mutually exclusive. The most robust long-horizon agents often combine all three: compaction to manage conversation length, note-taking to preserve critical state, and sub-agents to handle complex subtasks with dedicated focus.

Chapter 6: Practical Principles and Conclusion

Core Principles of Context Engineering

Context is a finite resource. Treat every token as having a cost. The attention budget is real and measurable.
Find the smallest high-signal set. The unifying principle across all techniques: find the smallest set of high-signal tokens that maximize the likelihood of your desired outcome.
Think holistically. Context engineering considers the entire state available to the model -- not just the prompt, but tools, examples, message history, retrieved data, and metadata.
Balance pre-loading with exploration. Hybrid strategies that combine upfront context with just-in-time retrieval tend to outperform pure approaches in either direction.
Externalize when appropriate. Not all information needs to live in the context window. Notes, files, databases, and sub-agent summaries can extend effective memory far beyond window limits.
Start simple, add complexity based on failure modes. Do the simplest thing that works. Over-engineering context management often introduces more problems than it solves.

The Future of Context Engineering

As models improve, smarter models require less prescriptive engineering, allowing agents to operate with more autonomy. Better models can navigate ambiguity, recover from errors, and make effective decisions with less hand-holding.

Yet the fundamental principle will endure: treating context as a precious, finite resource will remain central to building reliable, effective agents. Even as context windows grow and model capabilities advance, the attention budget remains finite, and the discipline of curating optimal context will continue to differentiate the most capable AI systems from the merely functional.

Summary

Context engineering represents a fundamental shift in how we build with LLMs. The transition from prompt engineering to context engineering reflects the evolution from single-turn interactions to autonomous, long-running agents. Rather than crafting the perfect prompt, the challenge involves thoughtfully curating what information enters the model's limited attention budget at each step of its operation.

The techniques covered -- from careful system prompt design, to just-in-time retrieval, to compaction, structured note-taking, and multi-agent architectures -- form a toolkit for managing this challenge effectively. The best practitioners combine these techniques fluidly, matching the approach to the requirements of each task.

Appendix: Key Concepts Reference

Concept	Definition
Context	The set of tokens included when sampling from an LLM
Context Engineering	Strategies for curating optimal token sets during inference
Context Rot	Degradation of recall accuracy as context length increases
Attention Budget	The finite capacity of a model to attend to context tokens
Compaction	Summarizing context near window limits to continue operation
Agentic Memory	External note-taking that persists beyond the context window
Progressive Disclosure	Incrementally discovering context through exploration
Just-In-Time Context	Loading detailed data via tools only when needed at runtime
Sub-Agent Architecture	Delegating focused tasks to specialized agents with clean context

This book is based on the Anthropic engineering blog post "Effective Context Engineering for AI Agents" published September 29, 2025. Original authors: Prithvi Rajasekaran, Ethan Dixon, Carly Ryan, and Jeremy Hadfield, with contributions from Rafi Ayub, Hannah Moran, Cal Rueb, and Connor Jennings.

Effective Context Engineering for AI Agents

About

Preview