Dynamic Workflows in Claude Code: A Complete Guide to Multi-Agent Harnesses
Add to your library first to use in Claude Code
About
Complete reference for Claude Code's dynamic workflows feature. Covers the JavaScript orchestration model, six workflow patterns (classify-and-act, fan-out-and-synthesize, adversarial verification, generate-and-filter, tournament, loop-until-done), all major use cases (migrations, research, verification, triage, evals, model routing), prompting strategies, token budgets, and saving/sharing workflows. Based on Anthropic's official blog post by Thariq Shihipar and Sid Bidasaria.
Preview
Dynamic Workflows in Claude Code: A Complete Guide to Multi-Agent Harnesses
Author: Research compilation — Anthropic blog + Claude Code documentation Source: "A harness for every task" — Anthropic Blog, 2026 (Thariq Shihipar & Sid Bidasaria) Target: Claude Code users who want to orchestrate multi-agent workflows for complex tasks Date: 2026-06-03
Part I — What Are Dynamic Workflows?
Chapter 1 — The Problem Workflows Solve
The default Claude Code harness is built for coding tasks — and coding tasks are what most Claude Code work resembles. But certain task classes push beyond what a single context window handles well:
- Long-running tasks that exceed practical context limits
- Massively parallel tasks where independent work would benefit from isolated contexts
- Highly structured tasks that need adversarial checks or rubric-based evaluation
- Tasks requiring different intelligence levels for different subtasks
Before dynamic workflows, these required building custom static harnesses using the Claude Agent SDK or claude -p — generic infrastructure that needed to handle every edge case.
The three failure modes of single-context-window execution:
| Failure Mode | Description | Example |
|---|---|---|
| Agentic laziness | Claude stops before finishing a complex multi-part task and declares it done | Addresses 20 of 50 items in a security review and reports completion |
| Self-preferential bias | Claude prefers its own results when asked to verify or judge them | Code reviewer that wrote the code grades it higher than warranted |
| Goal drift | Gradual loss of fidelity to original objective across many turns; lossy compaction causes edge-case requirements to disappear | "Don't touch the auth module" constraint gets lost after several summarization rounds |
Dynamic workflows combat all three by orchestrating separate Claude instances — each with their own context window, focused on an isolated goal.
Chapter 2 — Dynamic vs Static Workflows
| Property | Static Workflow | Dynamic Workflow |
|---|---|---|
| Created by | Human (pre-written JavaScript/SDK) | Claude itself, on the fly |
| Scope | Generic — must handle all edge cases | Purpose-built for the specific task |
| Flexibility | Fixed structure | Adapts to task requirements |
| Quality | Good for known, repeatable patterns | Best for novel or complex tasks |
| Setup | Requires engineering effort | Just ask Claude, or say "ultracode" |
| Reusability | Highly reusable | Can be saved and shared as templates |
| Requires | Claude Agent SDK or claude -p | Built into Claude Code |
Static workflows shine when you have a known, repeatable process that should always run the same way. Dynamic workflows shine when the structure of the work depends on the work itself — when you need Claude to inspect the task and design a harness for it.
Chapter 3 — How Dynamic Workflows Execute
Dynamic workflows are JavaScript files. When Claude builds a workflow, it writes a .js file that uses special orchestration functions alongside standard JavaScript (JSON, Math, Array, etc.).
Core orchestration capabilities:
- Spawn subagents: Launch Claude instances with specific prompts, models, and isolation levels
- Choose models per agent: Route different subtasks to Haiku (cheap/fast), Sonnet (balanced), or Opus (most capable)
- Worktree isolation: Run agents in git worktrees so their file changes don't conflict
- Session resumption: If a workflow is interrupted (user action, terminal quit), resuming the session picks up where it left off
Triggering workflows:
- Ask Claude directly: "Use a workflow to..." or "Set up a workflow that..."
- Use the trigger word
ultracode— guarantees Claude Code builds a workflow rather than attempting the task inline
The workflow runs as a deterministic JavaScript program. The non-deterministic intelligence lives inside the spawned agents; the orchestrating code is plain, predictable JS. This split is what makes workflows debuggable, resumable, and shareable.
Part II — The Six Core Patterns
Chapter 4 — Pattern 1: Classify-and-Act
What it is: A classifier agent first determines the type of task or input, then routes to specialized agents or behaviors based on that classification.
Variants:
- Upfront classification: Classify first, then route (most common)
- Post-hoc classification: Do the work, then classify the output to determine how to format or present results
When to use:
- Heterogeneous inputs that need different handling (support tickets with different categories)
- When the work to be done depends on properties of the input you don't know upfront
- When you want consistent output format regardless of input variation
Example prompt:
"Here's a folder of 80 resumes, use a workflow to rank them for the backend role
and double-check the top ten. Interview me using the AskUserQuestion tool for a rubric."
The workflow would: (1) classify resumes by experience level, (2) route to role-specific evaluators, (3) collect results, (4) rank.
Anti-patterns:
- Don't use classify-and-act when inputs are uniform — it just adds latency
- Don't use it for branching that can be determined by a simple regex or string check — that's plain JS, not an agent decision
Chapter 5 — Pattern 2: Fan-Out-and-Synthesize
What it is: Split a task into N independent subtasks, run an agent on each in parallel, then collect and synthesize results in a barrier step.
Input
│
├──> Agent 1 (subtask A) ──┐
├──> Agent 2 (subtask B) ──┤
├──> Agent 3 (subtask C) ──┤── Synthesizer Agent ──> Output
└──> Agent N (subtask N) ──┘
(barrier: wait for all)
Why clean context windows matter: When agents work in isolation, their results don't cross-contaminate. Agent 2's findings about module B don't bias Agent 3's review of module C.
The synthesize step is a barrier: It waits for ALL fan-out agents to complete before merging their structured outputs into one result.
When to use:
- Large number of smaller independent steps (code review across 50 files)
- Each subtask benefits from focused context (research across separate topics)
- Parallel execution dramatically reduces wall-clock time
Example prompts:
"Use a workflow to dig through #incidents in Slack for the past six months
and find recurring root causes where nobody has filed a ticket."
"Go through my blog post draft and using a workflow verify every technical
claim against the codebase, I don't want to ship anything wrong."
Structured output discipline: Each fan-out agent should return a structured object (JSON) that the synthesizer can merge mechanically. Free-form text is hard to combine; schemas force clarity.
Chapter 6 — Pattern 3: Adversarial Verification
What it is: For every agent that produces an output, spawn a second agent whose sole job is to adversarially challenge that output against a rubric.
Task ──> Worker Agent ──> Output ──> Verifier Agent ──> Verified Output
(adversarial) or Rejection
Why adversarial matters: A worker agent has self-preferential bias toward its own output. A verifier with no knowledge of the worker's reasoning process is structurally more skeptical.
Skeptic persona pattern: Give the verifier agent explicit "skeptic" instructions — "assume the worker made mistakes, look for them specifically, do not accept the output unless you can actively verify each claim."
When to use:
- Security reviews (worker finds vulnerabilities; verifier challenges each finding)
- Factual research (worker finds sources; verifier checks source quality)
- Code migrations (worker makes change; verifier checks correctness)
- Any task where false positives are costly and you need high confidence
Verifier prompt template:
"You are an adversarial reviewer. The worker agent produced the following output.
Assume the worker made at least one mistake. Your job is to find it.
Reject the output unless you can independently verify each claim against [source].
Do not trust the worker's reasoning — re-derive each conclusion yourself."
Chapter 7 — Pattern 4: Generate-and-Filter
What it is: Generate many candidate outputs, then filter by quality, verification, and deduplication to return only the highest-quality results.
Prompt ──> Generator (N options) ──> Filter Agent ──> Dedup ──> Top K Results
When to use:
- Creative tasks with qualitative criteria (naming, design, taglines)
- Idea generation where quantity first, quality second
- When you want diversity of approaches before converging
Example prompt:
"I need a name for this CLI tool. Use a workflow to brainstorm a bunch of options
and run a tournament to pick the top 3."
Diversity tactics:
- Spawn generators with explicit "differentiate from these" instructions to force variety
- Use different temperatures or different model sizes for different generators
- Have generators target different sub-niches (one for technical names, one for whimsical, etc.)
Chapter 8 — Pattern 5: Tournament
What it is: Instead of dividing work, have N agents compete on the same task using different approaches. A judging agent compares pairs of outputs until a winner emerges.
Task ──> Agent A (approach 1) ──┐
Task ──> Agent B (approach 2) ──┤──> Judge ──> Pairwise ──> Winner
Task ──> Agent C (approach 3) ──┘ Comparisons
Why pairwise works better than absolute scoring: Comparative judgment ("which of A vs B is better?") is more reliable than asking a judge to score each output on a 1-10 scale. The judge only needs to make local comparisons, not global assessments.
When to use:
- Taste-based decisions (design, naming, copy)
- Solutions where the "best" approach is unclear until you see alternatives
- Sorting large lists where qualitative judgment matters
- Any case where you want to explore the space before converging
Bracket structure: For N candidates, the deterministic tournament loop holds the bracket structure in the workflow's own context — only the current running comparison stays in each judge's context.
Tournament formats:
| Format | Comparisons | Use For |
|---|---|---|
| Single elimination | N-1 | Quick top-1 selection |
| Round-robin | N*(N-1)/2 | Need full ranking |
| Swiss / bracket-with-byes | N log N | Larger lists, balanced |
Chapter 9 — Pattern 6: Loop Until Done
What it is: Spawn agents repeatedly, checking a stop condition after each round, rather than a fixed number of passes.
while (stopCondition === false):
spawn agent(remaining_work)
check stopCondition
return results
Stop conditions:
- No new findings (security scan found nothing new this pass)
- No more errors in logs
- All items processed (queue empty)
- Quality threshold met (review agent approves)
When to use:
- Tasks with unknown amounts of work (security scan until clean)
- Iterative refinement until quality threshold
- Continuous triage (pair with
/loopfor ongoing operation)
Example prompt:
"This test fails maybe 1 in 50 runs. Set up a workflow to reproduce it,
form theories and adversarially test them in worktrees /goal don't stop
until one theory works."
Always set a maximum iteration cap. Even with a clear stop condition, hardware bugs, model errors, or external service flakes can cause infinite loops. Cap at N rounds and surface a "stopped early" signal if the cap is hit.
Part III — Use Cases
Chapter 10 — Migrations and Refactors
Large-scale code migrations benefit enormously from workflows because they decompose naturally into independent parallel units (files, modules, callsites).
The Bun/Rust pattern (as used in the real Bun Zig→Rust rewrite):
- Enumerate all units of work (callsites, failing tests, modules)
- Spawn a subagent for each unit in its own worktree
- Each agent makes its fix
- Adversarial review agent checks each change
- Merge approved changes
Key performance tip: Instruct agents to avoid resource-intensive commands (large grep, full rebuilds) so you can maximize parallelization without exhausting machine resources.
Example prompt:
"Use a workflow to rename our User model to Account everywhere."
Migration checklist:
| Step | Owner |
|---|---|
| Enumerate callsites | Orchestrator (deterministic JS) |
| Generate per-callsite fix | Worker agent (Sonnet usually sufficient) |
| Verify each fix compiles + tests pass | Verifier agent |
| Resolve cross-callsite conflicts | Synthesizer agent (Opus) |
| Final integration check | Single Opus pass over merged result |
Chapter 11 — Deep Research
Claude Code's built-in /deep-research skill is itself a dynamic workflow. It demonstrates the fan-out-and-synthesize pattern applied to research:
- Fan out: Run N web searches in parallel
- Fetch: Pull source content for each result
- Verify: Adversarially verify claims from each source
- Synthesize: Merge findings into a cited report with a barrier step
Beyond web search — research from internal sources:
- Mine Slack channels for status patterns, incident trends
- Explore how a feature works by fanning out across the codebase
- Compile reports from JIRA/Linear tickets in parallel
Citation discipline: Force every claim in the synthesizer output to carry a citation back to the source that produced it. The synthesizer drops any unverifiable claim rather than smoothing it over.
Chapter 12 — Deep Verification
The inverse of research: you have a document and want to verify every factual claim in it.
Workflow pattern:
- Agent 1: Scan document, extract all factual claims as structured list
- Fan out: One agent per claim — verify each independently
- Verifier agents: Check that each source is high-quality (not circular references, not outdated)
- Synthesize: Report verified, unverified, and wrong claims
Example prompt:
"Go through my blog post draft and using a workflow verify every technical claim
against the codebase, I don't want to ship anything wrong."
Output format:
| Claim | Status | Source | Notes |
|---|---|---|---|
| "Our API supports HTTP/2" | Verified | src/server/http.ts:42 | confirmed |
| "Latency is under 100ms" | Unverified | — | no benchmark in repo |
| "We use Postgres 15" | Wrong | docker-compose.yml shows 14 | needs correction |
Chapter 13 — Sorting at Scale
Sorting 1000+ items by qualitative measurement (bug severity, resume quality, support ticket priority) degrades badly in a single context window — the list doesn't fit and quality collapses.
Workflow approaches:
| Approach | Best For | How |
|---|---|---|
| Tournament | Small-medium lists (<200), taste decisions | Pairwise comparisons, bracket-style |
| Parallel bucket-rank then merge | Large lists (1000+) | Fan out into buckets, rank within each, merge ranks |
| Pairwise pipeline | Any size, highest accuracy | Each comparison is its own agent |
Key insight: Each comparison is its own agent — the deterministic loop holds the bracket, and only the running order stays in any agent's context. Agents never see the full list; they only see two items at a time.
Bucket-rank merge example:
1000 resumes
├─> Bucket A (250) → ranked locally → [A1..A250]
├─> Bucket B (250) → ranked locally → [B1..B250]
├─> Bucket C (250) → ranked locally → [C1..C250]
└─> Bucket D (250) → ranked locally → [D1..D250]
merge-rank agent → final ranking
Chapter 14 — Memory and Rule Adherence
Problem: Even rules in CLAUDE.md get missed or misapplied, especially as context grows.
Solution — one verifier per rule:
- Create a workflow with an explicit list of rules
- Spawn one verifier agent per rule
- Each verifier checks ONLY its assigned rule — focused, no distraction
- Skeptic agent reviews the verifiers to prevent false positives
Mining rules from sessions (reverse direction):
- Read last N sessions (fan out)
- Cluster corrections you keep making (parallel clustering agents)
- For each candidate rule: adversarially verify — would this rule have prevented a real mistake?
- Distill survivors back into CLAUDE.md
Example prompt:
"Using a workflow, go through my last 50 sessions and mine them for corrections
I keep making and turn the recurring ones into CLAUDE.md rules"
Chapter 15 — Root-Cause Investigation
Problem: Single-context debugging leads to self-preferential bias — Claude formed a hypothesis early and now cherry-picks evidence to support it.
Solution — structurally separated evidence streams:
- Fan out: Separate agents for different evidence sources (logs agent, code agent, data agent)
- Each agent generates hypotheses independently from its evidence slice
- Panel of verifier + refuter agents challenges each hypothesis
- Synthesizer consolidates surviving hypotheses
This is not just for code:
- Sales analytics: "Why did sales drop in March?" — separate agents for marketing data, product changes, competitive moves, economic factors
- Data engineering: "Why did this pipeline fail?" — separate agents for each system component
- Post-mortem exercises: Any root-cause analysis benefits from structurally independent investigation
Hypothesis lifecycle:
| Stage | Agent | Output |
|---|---|---|
| Evidence gather | Logs / Code / Data agents (parallel) | Structured findings per source |
| Hypothesis form | Per-source theorist | Candidate root causes |
| Adversarial test | Refuter agent | Pass / refuted / inconclusive |
| Consolidate | Synthesizer | Ranked surviving hypotheses |
Chapter 16 — Triaging at Scale
Every team has a queue (support tickets, bug reports, PR reviews) that can't be fully processed by humans.
Triage workflow pattern:
- Classify each item by category, severity, and urgency
- Deduplicate against already-tracked items
- Route: attempt automated fix, escalate to human, or archive
Quarantine pattern (security-critical): Agents that READ untrusted public content are not allowed to take high-privilege actions. A separate privileged agent acts on the information after the reading agent summarizes it. This prevents prompt injection from untrusted content from triggering destructive actions.
Continuous triage: Pair with /loop to run triage at regular intervals — the workflow runs, processes the queue, then sleeps until the next cycle.
Example prompt:
"Use a workflow to dig through #incidents in Slack for the past six months
and find recurring root causes where nobody has filed a ticket."
Chapter 17 — Exploration and Taste
Taste-based decisions (naming, design, copy, architecture style) benefit from:
- Many candidate options (generate-and-filter)
- Structured evaluation against a rubric
- Tournament-style selection to converge on the best
Pattern:
- Generate N candidates
- Give review agent a rubric for what "good" looks like
- Review agent iterates until rubric criteria are met, or tournament selects the winner
Example prompts:
"Take my business plan and run a workflow where different agents tear it apart
from an investor's, a customer's, and a competitor's perspective."
"I need a name for this CLI tool. Use a workflow to brainstorm a bunch of options
and run a tournament to pick the top 3."
Chapter 18 — Evals
Build lightweight evals for your own skills, prompts, or code:
- Spin off N agents in isolated worktrees — each runs the thing being evaluated
- Comparison agents grade outputs pairwise against a rubric
- Aggregate grades to produce a ranked result
- Optionally: use the lowest-scoring outputs to refine the evaluated item and re-run
When to use: When you've built a skill or feature and want to measure quality before shipping, or when you want to continuously monitor quality as you make changes.
Eval pipeline:
| Step | Purpose |
|---|---|
| Define rubric | Concrete criteria the output must satisfy |
| Generate test inputs | N representative scenarios |
| Run candidate | Each worktree runs the target skill/prompt |
| Pairwise grade | Comparison agent picks better of each pair |
| Rank | Aggregate pairwise wins into a ranking |
| Refine | Use the worst outputs to improve the target |
Chapter 19 — Model and Intelligence Routing
Not every subtask needs Opus. A well-designed workflow routes subtasks to the appropriate model:
| Task Type | Recommended Model | Reason |
|---|---|---|
| Classification, routing decisions | Haiku | Fast, cheap, sufficient for binary decisions |
| Standard implementation, search | Sonnet | Good balance of speed and capability |
| Complex reasoning, synthesis, review | Opus | Maximum capability for judgment tasks |
Classifier agent pattern:
- Classifier agent researches the task (reads relevant files, counts lines, assesses complexity)
- Based on findings, routes to Sonnet or Opus for the actual work
- Result: expensive Opus tokens used only where they're needed
Example: "Explain how the auth module works" — the right model depends on how many files are in the auth module. Classifier reads the file tree, picks the model, then that model does the explanation.
Routing heuristics:
- If the work fits in <10k tokens and has a clear right answer → Haiku
- If the work needs to read multiple files and write code → Sonnet
- If the work needs judgment, synthesis, or adversarial review → Opus
Part IV — Practical Usage
Chapter 20 — Triggering Workflows
Two ways to start a workflow:
- Natural language: "Use a workflow to..." / "Set up a workflow that..." / "Build a workflow for..."
- Ultracode trigger word: Typing
ultracodein your prompt guarantees Claude Code builds a workflow rather than attempting the task inline
Quick workflows: Workflows aren't only for large tasks. Prompt for "quick workflow" to get a small adversarial check, assumption validation, or fast parallel search.
Detailed prompting gets better workflows: The more specific you are about what pattern you want (fan-out, tournament, adversarial verification), the better the generated workflow. Claude is intelligent enough to implement any of the six patterns if you name them.
Trigger phrase reference:
| You type | What happens |
|---|---|
| "Use a workflow to..." | Claude proposes and builds a workflow |
| "Set up a workflow that..." | Same, often slightly more interactive |
| "Build a quick workflow..." | Lightweight workflow, small budget |
| "ultracode <task>" | Guaranteed workflow path, no inline attempt |
Chapter 21 — Prompting Strategies
Name the pattern explicitly:
"Use a fan-out-and-synthesize workflow to..."
"Build an adversarial verification workflow that..."
"Run a tournament workflow to decide between..."
Specify models where it matters:
"Use Haiku agents for the classification step and Opus for the final synthesis"
Set stop conditions clearly:
"...don't stop until at least one hypothesis survives adversarial review"
Specify parallelism constraints:
"Don't use resource-intensive commands so we can parallelize maximally"
Combine with /goal for hard completion requirements:
"Use a workflow to fix all lint errors /goal zero lint errors remaining"
Combine with /loop for recurring workflows:
"Set up a triage workflow for new GitHub issues, then /loop to run it hourly"
Prompting checklist:
- Did you name the pattern (fan-out, tournament, etc.)?
- Did you specify the stop condition or completion criteria?
- Did you set a token or time budget?
- Did you route models appropriately (Haiku/Sonnet/Opus)?
- Did you call out any invariants ("don't touch X")?
Chapter 22 — Token Budgets
Dynamic workflows use more tokens than single-context execution — sometimes significantly more. Each spawned agent uses its own context.
Setting explicit budgets:
"Use a workflow for this, budget 10k tokens"
"Build a quick workflow, keep it under 5k tokens total"
Cost optimization strategies:
- Use cheap models (Haiku) for classification and routing steps
- Set low token budgets for each individual agent where appropriate
- Limit parallelism when running many agents simultaneously
- Use "quick workflow" for small tasks rather than heavy orchestration
- Ask yourself: does this task truly benefit from multiple agents, or is it a standard coding task that the default harness handles well?
When workflows are NOT worth the token cost:
- Standard coding tasks (write a function, fix a bug, add a test)
- Tasks that fit comfortably in one context window
- Tasks that don't benefit from parallelism or adversarial review
- Quick one-off changes
Budget reference rules of thumb:
| Workflow scale | Typical token spend | When |
|---|---|---|
| Quick check | 2k–5k | Single adversarial pass, validation |
| Standard fan-out | 20k–80k | 5–20 agents on a focused task |
| Deep research | 100k–500k | Many sources, deep verification |
| Large migration | 500k+ | Hundreds of worktrees, full review pass |
Chapter 23 — Saving and Sharing Workflows
Saving a workflow:
- Press
sin the workflow menu while a workflow is running - Workflow is saved to
~/.claude/workflows/
Sharing via skills:
- Put the JavaScript workflow files in your skill folder
- Reference them in
SKILL.MD - Distribute the skill (GitHub, npm, Claude Code skills marketplace)
Using shared workflows as templates:
When receiving a workflow from someone else, prompt Claude to treat it as a template rather than a verbatim script:
"Use this workflow as a template but adapt it for my specific codebase structure"
This allows flexible reuse without hard-coded assumptions from the original context.
Versioning shared workflows:
- Treat workflows like code — commit them, review them, tag releases
- Document the input contract (what the workflow expects) and output contract (what it produces)
- Include a sample input/output pair in the skill folder for grounding
Chapter 24 — Combining with /loop and /goal
/loop integration: Pair repeatable workflows (triage, monitoring, rule-checking) with /loop to run at regular intervals:
"Set up a triage workflow for the #bugs Slack channel, pair with /loop to check every 4 hours"
/goal integration: Set a hard stop condition that the workflow must achieve:
"Use a workflow to find and fix flaky tests /goal all tests passing reliably over 100 runs"
Combined pattern: /loop drives the cadence; /goal defines when the loop can stop.
Practical recipes:
| Goal | Recipe |
|---|---|
| Hourly issue triage | workflow + /loop 1h |
| Fix every flaky test until green | workflow + /goal all tests green |
| Daily CLAUDE.md rule audit | workflow + /loop 24h |
| Migrate until zero callsites left | workflow + /goal zero remaining callsites |
Part V — Reference
Chapter 25 — Pattern Selection Guide
| You want to... | Use this pattern |
|---|---|
| Handle heterogeneous inputs differently | Classify-and-Act |
| Process N items independently in parallel | Fan-Out-and-Synthesize |
| Check work for errors or misses | Adversarial Verification |
| Generate many options then pick best | Generate-and-Filter or Tournament |
| Make a taste-based decision | Tournament |
| Run until something is done | Loop Until Done |
| Combine multiple patterns | Compose them — patterns are composable |
Composition example: A migration is Fan-Out (per file) + Adversarial Verification (per change) + Loop Until Done (until zero remaining callsites) + Classify-and-Act (route trivial vs complex callsites to different models).
Chapter 26 — Use Case Quick Reference
| Use Case | Primary Pattern | Example |
|---|---|---|
| Large migration/refactor | Fan-out + Adversarial | Rename User to Account everywhere |
| Deep research | Fan-out + Synthesize | Research Slack incidents, find patterns |
| Claim verification | Fan-out + Adversarial | Verify every claim in a blog post |
| Sorting at scale | Tournament or Bucket-rank | Rank 80 resumes for a role |
| CLAUDE.md rule mining | Fan-out + Adversarial | Mine last 50 sessions for rules |
| Root-cause analysis | Fan-out + Adversarial | Debug flaky test across N hypotheses |
| Continuous triage | Loop + Classify-and-Act | Process support queue hourly |
| Design/naming | Generate-and-Filter + Tournament | Pick top 3 names for CLI tool |
| Evals | Fan-out + Pairwise comparison | Grade skill outputs against rubric |
| Model routing | Classify-and-Act | Route auth questions to Opus, trivial to Haiku |
Chapter 27 — Failure Mode Reference
| Failure Mode | What Happens | Workflow Fix |
|---|---|---|
| Agentic laziness | Claude stops at 40/50 items, says "done" | Each agent has a fixed, bounded scope; completion is measurable |
| Self-preferential bias | Claude grades its own work too generously | Separate verifier agents have no access to the worker's reasoning process |
| Goal drift | "Don't touch auth" forgotten after compaction | Each agent's goal is explicit in its spawn prompt; loop context holds invariants |
Diagnostic questions when a workflow goes wrong:
- Did the agent's spawn prompt include the invariant that was violated?
- Was the stop condition measurable, or fuzzy?
- Did the verifier have independent access to the source of truth, or was it just looking at the worker's output?
- Was the task split fine enough that each agent could finish within its budget?
Chapter 28 — Example Prompt Gallery
Debugging:
"This test fails maybe 1 in 50 runs. Set up a workflow to reproduce it, form theories
and adversarially test them in worktrees /goal don't stop until one theory works."
Knowledge mining:
"Using a workflow, go through my last 50 sessions and mine them for corrections I keep
making and turn the recurring ones into CLAUDE.md rules"
Incident analysis:
"Use a workflow to dig through #incidents in Slack for the past six months and find
recurring root causes where nobody has filed a ticket."
Adversarial review:
"Take my business plan and run a workflow where different agents tear it apart from
an investor's, a customer's, and a competitor's perspective."
Structured hiring:
"Here's a folder of 80 resumes, use a workflow to rank them for the backend role and
double-check the top ten. Interview me using the AskUserQuestion tool for a rubric."
Naming with tournament:
"I need a name for this CLI tool. Use a workflow to brainstorm a bunch of options
and run a tournament to pick the top 3."
Large refactor:
"Use a workflow to rename our User model to Account everywhere."
Technical fact-checking:
"Go through my blog post draft and using a workflow verify every technical claim
against the codebase, I don't want to ship anything wrong."
Closing Notes
Dynamic workflows shift the unit of work in Claude Code from "a single Claude turn" to "an orchestrated program of Claude turns." The orchestration is deterministic JavaScript; the intelligence inside each step is a focused Claude agent with its own context.
The most important skill is recognizing when a task wants a workflow rather than a single turn. The three failure modes — agentic laziness, self-preferential bias, and goal drift — are your signals. When you see them coming, reach for the right pattern from the six core patterns and compose as needed.
When in doubt: start with a quick workflow, set a clear stop condition, name the pattern explicitly, route models by intelligence required, and adversarially verify anything that matters.