Effective Context Engineering for AI Agents
by Prithvi Rajasekaran, Ethan Dixon, Carly Ryan, Jeremy Hadfield
Add to your library first to use in Claude Code
About
A comprehensive guide to context engineering for AI agents covering the shift from prompt engineering to managing optimal token sets during inference. Covers techniques for long-horizon tasks, context retrieval strategies, multi-agent architectures, and practical principles for treating context as a finite resource.
Preview
Effective Context Engineering for AI Agents
Authors: Prithvi Rajasekaran, Ethan Dixon, Carly Ryan, and Jeremy Hadfield (Anthropic Applied AI Team) Contributors: Rafi Ayub, Hannah Moran, Cal Rueb, and Connor Jennings Source: Anthropic Engineering Blog, September 29, 2025
Chapter 1: Introduction -- From Prompt Engineering to Context Engineering
Building with language models has evolved beyond simply perfecting the wording of prompts. The new paradigm is context engineering -- curating and maintaining the optimal set of tokens during LLM inference.
What Is Context?
Context is the set of tokens included when sampling from a large-language model. This includes everything the model can see at inference time: system instructions, tool definitions, message history, retrieved data, and more.
What Is Context Engineering?
Context engineering encompasses the strategies for curating and maintaining the optimal set of information during LLM inference, including all the data that may land in the context window outside of the prompts themselves.
The core mindset is thinking in context -- considering the holistic state available to the LLM at any given time and what potential behaviors that state might yield.
Why the Shift Matters
For discrete, single-turn tasks like classification or text generation, prompt engineering (writing and organizing LLM instructions for optimal outcomes) works well. But for multi-turn agents with extended task horizons, the challenge expands dramatically. An agent running in a loop generates more and more data that could be relevant, requiring cyclical refinement of what enters the context window.
Context engineering is the art and science of curating what will go into the limited context window at each step of an agent's operation.
Chapter 2: Why Context Engineering Matters for Capable Agents
Three fundamental challenges make context engineering essential for building reliable AI agents.
The Context Rot Problem
Research has shown that as the number of tokens in the context window increases, the model's ability to accurately recall information from that context decreases. This phenomenon, called context rot, affects all models to varying degrees. Simply stuffing more information into the context window does not reliably improve performance -- it can actively degrade it.
The Finite Attention Budget
Just as humans have limited working memory capacity, LLMs have an attention budget that is depleted by each new token added to the context. Every token added consumes some of this budget. Context must therefore be treated as a finite resource with diminishing marginal returns -- each additional piece of information needs to justify its cost.
Architectural Constraints of Transformers
The transformer architecture enables every token to attend to every other token across the entire context, creating n-squared pairwise relationships for n tokens. As context expands, models struggle to maintain these relationships effectively. Additionally, models are trained predominantly on shorter sequences, leaving them less experienced with lengthy contexts.
Techniques like position encoding interpolation help extend effective context length, but they create a performance gradient rather than a hard cliff -- performance degrades gradually rather than cutting off suddenly, but the degradation is real and cumulative.
Chapter 3: The Anatomy of Effective Context
The guiding principle of context engineering is finding the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome. This chapter examines each component of effective context.
System Prompts
System prompts should be extremely clear and use simple, direct language. The key challenge is finding the right altitude -- between brittle, over-specified logic and vague, underspecified guidance.
The optimal system prompt is specific enough to guide behavior effectively, yet flexible enough to provide the model with strong heuristics for handling situations not explicitly covered.
Structural best practices:
- Organize into distinct sections using XML tags or Markdown headers
- Use clear section markers like
<background_information>,<instructions>,## Tool guidance,## Output description - Strive for the minimal set of information that fully outlines expected behavior
- Start with minimal prompts, then add instructions based on observed failure modes
Tools
Tools are a critical part of an agent's context. They must:
- Promote efficiency through token-conscious returns and encourage efficient agent behaviors
- Be self-contained and robust to error -- tools should handle edge cases gracefully
- Be extremely clear about intended use -- ambiguous tool descriptions lead to misuse
- Have descriptive and unambiguous parameters -- each parameter's purpose and format should be obvious
A common failure mode is providing bloated tool sets with overlapping functionality. If a human engineer cannot definitively say which tool should be used in a given situation, an AI agent cannot be expected to do better. Tool sets should be curated with the same care as system prompts.
Examples and Few-Shot Prompting
Rather than listing every possible edge case exhaustively, the most effective approach is to curate a set of diverse, canonical examples that effectively portray the expected behavior of the agent. For LLMs, examples are the equivalent of pictures worth a thousand words -- they communicate expected patterns more efficiently than lengthy rule descriptions.
The Unifying Principle
Across all context components -- system prompts, tools, examples, and message history -- the guidance is the same: be thoughtful and keep your context informative, yet tight. Every token should earn its place.
Chapter 4: Context Retrieval and Agentic Search
As agents become more capable, how they find and load information becomes as important as how that information is structured.
Defining Agents
The field converges on a working definition: LLMs autonomously using tools in a loop. As models improve, agent autonomy scales -- smarter models handle more nuanced problems and recover from errors more independently.
Just-In-Time Context Strategies
Rather than pre-processing and loading all potentially relevant data into the context upfront, effective agents maintain lightweight identifiers -- file paths, stored queries, web links, and other pointers -- and dynamically load detailed data via tools at runtime only when needed.
Example from Claude Code: For data analysis tasks, Claude Code writes targeted queries and uses Bash commands to analyze large datasets without loading full data objects into the context window. This mirrors how humans work: we use external organization and indexing systems like file systems, inboxes, and bookmarks to retrieve relevant information on demand rather than memorizing everything.
Metadata as Signal
File paths, naming conventions, timestamps, and other metadata provide important signals that help both humans and agents understand how and when to utilize information. Well-organized metadata enables agents to make better decisions about what to retrieve without loading the full content.
Progressive Disclosure
Autonomous retrieval enables agents to incrementally discover relevant context through exploration, assembling understanding layer by layer while maintaining only what is necessary in working memory. This progressive approach prevents context bloat while allowing thorough investigation.
The Speed-Autonomy Trade-off
Runtime exploration is inherently slower than pre-computed retrieval. Building effective agentic search requires opinionated and thoughtful engineering to ensure LLMs have the right tools and heuristics for effectively navigating their information landscape.
Hybrid Strategies
The most effective approaches balance speed (pre-retrieved data loaded upfront) with autonomy (just-in-time exploration at runtime).
Claude Code as exemplar: CLAUDE.md files load context upfront at the start of each session, while glob and grep tools enable runtime exploration when the agent needs to discover additional information. This hybrid approach provides a strong foundation of context while preserving the ability to explore.
The practical advice: do the simplest thing that works until model capabilities justify more sophisticated approaches. Over-engineering context retrieval systems often introduces more problems than it solves.
Chapter 5: Context Engineering for Long-Horizon Tasks
Long-horizon tasks -- those running for tens of minutes to hours -- require agents to maintain coherence across token counts that may exceed the context window. Three primary techniques address this challenge.
Technique 1: Compaction
Compaction involves taking a conversation nearing the context window limit, summarizing its contents, and reinitiating a new context window with the summary. This distills the contents in a high-fidelity manner, enabling the agent to continue with minimal performance degradation.
How Claude Code implements compaction: The model summarizes critical details while discarding redundant tool outputs or messages. The agent continues with the compressed context plus the five most recently accessed files, ensuring continuity of the most immediately relevant information.
Key considerations for effective compaction:
- Avoid aggressive compaction that loses subtle but important context
- Maximize recall initially to capture all relevant information in the summary
- Iterate to improve precision by eliminating superfluous content over time
- Safe compaction targets: Clearing tool calls and results is generally safe, as these tend to be verbose and their essential information can be captured in summaries
Technique 2: Structured Note-Taking (Agentic Memory)
Rather than relying solely on compaction, agents can regularly write notes persisted to memory outside of the context window. These notes are pulled back into the context window at later times when needed.
This approach provides persistent memory with minimal overhead -- the notes themselves are compact, but the information they preserve can be critical.
Examples of agentic memory in practice:
- Claude Code maintains to-do lists that track progress across long development sessions
- Claude playing Pokemon tracks precise tallies across thousands of game steps, maintains maps of explored areas, and records strategic notes about game state
- Development agents maintain NOTES.md files that persist architectural decisions, discovered constraints, and implementation plans
Structured note-taking enables long-horizon strategies that would be impossible when keeping all the information in the LLM's context window alone. By externalizing memory, agents can operate effectively across arbitrarily long task horizons.
Recent platform development: Anthropic launched a memory tool in public beta on the Claude Developer Platform using a file-based system for storing and consulting information outside context windows.
Technique 3: Sub-Agent Architectures
In multi-agent architectures, specialized sub-agents handle focused tasks with clean, dedicated context windows while a main orchestrating agent coordinates the overall effort. Each sub-agent can explore extensively within its domain but returns a condensed, distilled summary of its work -- often just 1,000 to 2,000 tokens.
This creates a clear separation of concerns: the detailed search or analysis context remains isolated within sub-agents, while the lead agent focuses on synthesizing and analyzing the high-level results. The lead agent's context stays clean and focused, avoiding the context rot that would result from loading all the detailed exploration directly.
Results: Multi-agent research systems have shown substantial improvement over single-agent systems on complex research tasks, particularly those requiring parallel exploration of multiple information sources.
Choosing the Right Approach
Each technique suits different scenarios:
| Technique | Best For | Key Strength |
|---|---|---|
| Compaction | Extensive back-and-forth conversations | Maintains conversational flow and continuity |
| Note-taking | Iterative development with clear milestones | Preserves critical decisions and state across long horizons |
| Multi-agent | Complex research and analysis | Parallel exploration with clean context separation |
These approaches are not mutually exclusive. The most robust long-horizon agents often combine all three: compaction to manage conversation length, note-taking to preserve critical state, and sub-agents to handle complex subtasks with dedicated focus.
Chapter 6: Practical Principles and Conclusion
Core Principles of Context Engineering
-
Context is a finite resource. Treat every token as having a cost. The attention budget is real and measurable.
-
Find the smallest high-signal set. The unifying principle across all techniques: find the smallest set of high-signal tokens that maximize the likelihood of your desired outcome.
-
Think holistically. Context engineering considers the entire state available to the model -- not just the prompt, but tools, examples, message history, retrieved data, and metadata.
-
Balance pre-loading with exploration. Hybrid strategies that combine upfront context with just-in-time retrieval tend to outperform pure approaches in either direction.
-
Externalize when appropriate. Not all information needs to live in the context window. Notes, files, databases, and sub-agent summaries can extend effective memory far beyond window limits.
-
Start simple, add complexity based on failure modes. Do the simplest thing that works. Over-engineering context management often introduces more problems than it solves.
The Future of Context Engineering
As models improve, smarter models require less prescriptive engineering, allowing agents to operate with more autonomy. Better models can navigate ambiguity, recover from errors, and make effective decisions with less hand-holding.
Yet the fundamental principle will endure: treating context as a precious, finite resource will remain central to building reliable, effective agents. Even as context windows grow and model capabilities advance, the attention budget remains finite, and the discipline of curating optimal context will continue to differentiate the most capable AI systems from the merely functional.
Summary
Context engineering represents a fundamental shift in how we build with LLMs. The transition from prompt engineering to context engineering reflects the evolution from single-turn interactions to autonomous, long-running agents. Rather than crafting the perfect prompt, the challenge involves thoughtfully curating what information enters the model's limited attention budget at each step of its operation.
The techniques covered -- from careful system prompt design, to just-in-time retrieval, to compaction, structured note-taking, and multi-agent architectures -- form a toolkit for managing this challenge effectively. The best practitioners combine these techniques fluidly, matching the approach to the requirements of each task.
Appendix: Key Concepts Reference
| Concept | Definition |
|---|---|
| Context | The set of tokens included when sampling from an LLM |
| Context Engineering | Strategies for curating optimal token sets during inference |
| Context Rot | Degradation of recall accuracy as context length increases |
| Attention Budget | The finite capacity of a model to attend to context tokens |
| Compaction | Summarizing context near window limits to continue operation |
| Agentic Memory | External note-taking that persists beyond the context window |
| Progressive Disclosure | Incrementally discovering context through exploration |
| Just-In-Time Context | Loading detailed data via tools only when needed at runtime |
| Sub-Agent Architecture | Delegating focused tasks to specialized agents with clean context |
This book is based on the Anthropic engineering blog post "Effective Context Engineering for AI Agents" published September 29, 2025. Original authors: Prithvi Rajasekaran, Ethan Dixon, Carly Ryan, and Jeremy Hadfield, with contributions from Rafi Ayub, Hannah Moran, Cal Rueb, and Connor Jennings.