Anthropic

Dynamic Workflows in Claude Code: A Complete Guide to Multi-Agent Harnesses

workflows
Pages1
Formatmarkdown
ListedJune 3, 2026
UpdatedJune 3, 2026
Subscribers2

About

Complete reference for Claude Code's dynamic workflows feature. Covers the JavaScript orchestration model, six workflow patterns (classify-and-act, fan-out-and-synthesize, adversarial verification, generate-and-filter, tournament, loop-until-done), all major use cases (migrations, research, verification, triage, evals, model routing), prompting strategies, token budgets, and saving/sharing workflows. Based on Anthropic's official blog post by Thariq Shihipar and Sid Bidasaria.

1Chapters
35Topics
1Pages

Preview

Dynamic Workflows in Claude Code: A Complete Guide to Multi-Agent Harnesses

Author: Research compilation — Anthropic blog + Claude Code documentation Source: "A harness for every task" — Anthropic Blog, 2026 (Thariq Shihipar & Sid Bidasaria) Target: Claude Code users who want to orchestrate multi-agent workflows for complex tasks Date: 2026-06-03


Part I — What Are Dynamic Workflows?

Chapter 1 — The Problem Workflows Solve

The default Claude Code harness is built for coding tasks — and coding tasks are what most Claude Code work resembles. But certain task classes push beyond what a single context window handles well:

  • Long-running tasks that exceed practical context limits
  • Massively parallel tasks where independent work would benefit from isolated contexts
  • Highly structured tasks that need adversarial checks or rubric-based evaluation
  • Tasks requiring different intelligence levels for different subtasks

Before dynamic workflows, these required building custom static harnesses using the Claude Agent SDK or claude -p — generic infrastructure that needed to handle every edge case.

The three failure modes of single-context-window execution:

Failure ModeDescriptionExample
Agentic lazinessClaude stops before finishing a complex multi-part task and declares it doneAddresses 20 of 50 items in a security review and reports completion
Self-preferential biasClaude prefers its own results when asked to verify or judge themCode reviewer that wrote the code grades it higher than warranted
Goal driftGradual loss of fidelity to original objective across many turns; lossy compaction causes edge-case requirements to disappear"Don't touch the auth module" constraint gets lost after several summarization rounds

Dynamic workflows combat all three by orchestrating separate Claude instances — each with their own context window, focused on an isolated goal.

Chapter 2 — Dynamic vs Static Workflows

PropertyStatic WorkflowDynamic Workflow
Created byHuman (pre-written JavaScript/SDK)Claude itself, on the fly
ScopeGeneric — must handle all edge casesPurpose-built for the specific task
FlexibilityFixed structureAdapts to task requirements
QualityGood for known, repeatable patternsBest for novel or complex tasks
SetupRequires engineering effortJust ask Claude, or say "ultracode"
ReusabilityHighly reusableCan be saved and shared as templates
RequiresClaude Agent SDK or claude -pBuilt into Claude Code

Static workflows shine when you have a known, repeatable process that should always run the same way. Dynamic workflows shine when the structure of the work depends on the work itself — when you need Claude to inspect the task and design a harness for it.

Chapter 3 — How Dynamic Workflows Execute

Dynamic workflows are JavaScript files. When Claude builds a workflow, it writes a .js file that uses special orchestration functions alongside standard JavaScript (JSON, Math, Array, etc.).

Core orchestration capabilities:

  • Spawn subagents: Launch Claude instances with specific prompts, models, and isolation levels
  • Choose models per agent: Route different subtasks to Haiku (cheap/fast), Sonnet (balanced), or Opus (most capable)
  • Worktree isolation: Run agents in git worktrees so their file changes don't conflict
  • Session resumption: If a workflow is interrupted (user action, terminal quit), resuming the session picks up where it left off

Triggering workflows:

  • Ask Claude directly: "Use a workflow to..." or "Set up a workflow that..."
  • Use the trigger word ultracode — guarantees Claude Code builds a workflow rather than attempting the task inline

The workflow runs as a deterministic JavaScript program. The non-deterministic intelligence lives inside the spawned agents; the orchestrating code is plain, predictable JS. This split is what makes workflows debuggable, resumable, and shareable.


Part II — The Six Core Patterns

Chapter 4 — Pattern 1: Classify-and-Act

What it is: A classifier agent first determines the type of task or input, then routes to specialized agents or behaviors based on that classification.

Variants:

  • Upfront classification: Classify first, then route (most common)
  • Post-hoc classification: Do the work, then classify the output to determine how to format or present results

When to use:

  • Heterogeneous inputs that need different handling (support tickets with different categories)
  • When the work to be done depends on properties of the input you don't know upfront
  • When you want consistent output format regardless of input variation

Example prompt:

"Here's a folder of 80 resumes, use a workflow to rank them for the backend role
and double-check the top ten. Interview me using the AskUserQuestion tool for a rubric."

The workflow would: (1) classify resumes by experience level, (2) route to role-specific evaluators, (3) collect results, (4) rank.

Anti-patterns:

  • Don't use classify-and-act when inputs are uniform — it just adds latency
  • Don't use it for branching that can be determined by a simple regex or string check — that's plain JS, not an agent decision

Chapter 5 — Pattern 2: Fan-Out-and-Synthesize

What it is: Split a task into N independent subtasks, run an agent on each in parallel, then collect and synthesize results in a barrier step.

Input
  │
  ├──> Agent 1 (subtask A) ──┐
  ├──> Agent 2 (subtask B) ──┤
  ├──> Agent 3 (subtask C) ──┤── Synthesizer Agent ──> Output
  └──> Agent N (subtask N) ──┘
        (barrier: wait for all)

Why clean context windows matter: When agents work in isolation, their results don't cross-contaminate. Agent 2's findings about module B don't bias Agent 3's review of module C.

The synthesize step is a barrier: It waits for ALL fan-out agents to complete before merging their structured outputs into one result.

When to use:

  • Large number of smaller independent steps (code review across 50 files)
  • Each subtask benefits from focused context (research across separate topics)
  • Parallel execution dramatically reduces wall-clock time

Example prompts:

"Use a workflow to dig through #incidents in Slack for the past six months
and find recurring root causes where nobody has filed a ticket."
"Go through my blog post draft and using a workflow verify every technical
claim against the codebase, I don't want to ship anything wrong."

Structured output discipline: Each fan-out agent should return a structured object (JSON) that the synthesizer can merge mechanically. Free-form text is hard to combine; schemas force clarity.

Chapter 6 — Pattern 3: Adversarial Verification

What it is: For every agent that produces an output, spawn a second agent whose sole job is to adversarially challenge that output against a rubric.

Task ──> Worker Agent ──> Output ──> Verifier Agent ──> Verified Output
                                         (adversarial)     or Rejection

Why adversarial matters: A worker agent has self-preferential bias toward its own output. A verifier with no knowledge of the worker's reasoning process is structurally more skeptical.

Skeptic persona pattern: Give the verifier agent explicit "skeptic" instructions — "assume the worker made mistakes, look for them specifically, do not accept the output unless you can actively verify each claim."

When to use:

  • Security reviews (worker finds vulnerabilities; verifier challenges each finding)
  • Factual research (worker finds sources; verifier checks source quality)
  • Code migrations (worker makes change; verifier checks correctness)
  • Any task where false positives are costly and you need high confidence

Verifier prompt template:

"You are an adversarial reviewer. The worker agent produced the following output.
Assume the worker made at least one mistake. Your job is to find it.
Reject the output unless you can independently verify each claim against [source].
Do not trust the worker's reasoning — re-derive each conclusion yourself."

Chapter 7 — Pattern 4: Generate-and-Filter

What it is: Generate many candidate outputs, then filter by quality, verification, and deduplication to return only the highest-quality results.

Prompt ──> Generator (N options) ──> Filter Agent ──> Dedup ──> Top K Results

When to use:

  • Creative tasks with qualitative criteria (naming, design, taglines)
  • Idea generation where quantity first, quality second
  • When you want diversity of approaches before converging

Example prompt:

"I need a name for this CLI tool. Use a workflow to brainstorm a bunch of options
and run a tournament to pick the top 3."

Diversity tactics:

  • Spawn generators with explicit "differentiate from these" instructions to force variety
  • Use different temperatures or different model sizes for different generators
  • Have generators target different sub-niches (one for technical names, one for whimsical, etc.)

Chapter 8 — Pattern 5: Tournament

What it is: Instead of dividing work, have N agents compete on the same task using different approaches. A judging agent compares pairs of outputs until a winner emerges.

Task ──> Agent A (approach 1) ──┐
Task ──> Agent B (approach 2) ──┤──> Judge ──> Pairwise ──> Winner
Task ──> Agent C (approach 3) ──┘             Comparisons

Why pairwise works better than absolute scoring: Comparative judgment ("which of A vs B is better?") is more reliable than asking a judge to score each output on a 1-10 scale. The judge only needs to make local comparisons, not global assessments.

When to use:

  • Taste-based decisions (design, naming, copy)
  • Solutions where the "best" approach is unclear until you see alternatives
  • Sorting large lists where qualitative judgment matters
  • Any case where you want to explore the space before converging

Bracket structure: For N candidates, the deterministic tournament loop holds the bracket structure in the workflow's own context — only the current running comparison stays in each judge's context.

Tournament formats:

FormatComparisonsUse For
Single eliminationN-1Quick top-1 selection
Round-robinN*(N-1)/2Need full ranking
Swiss / bracket-with-byesN log NLarger lists, balanced

Chapter 9 — Pattern 6: Loop Until Done

What it is: Spawn agents repeatedly, checking a stop condition after each round, rather than a fixed number of passes.

while (stopCondition === false):
    spawn agent(remaining_work)
    check stopCondition
return results

Stop conditions:

  • No new findings (security scan found nothing new this pass)
  • No more errors in logs
  • All items processed (queue empty)
  • Quality threshold met (review agent approves)

When to use:

  • Tasks with unknown amounts of work (security scan until clean)
  • Iterative refinement until quality threshold
  • Continuous triage (pair with /loop for ongoing operation)

Example prompt:

"This test fails maybe 1 in 50 runs. Set up a workflow to reproduce it,
form theories and adversarially test them in worktrees /goal don't stop
until one theory works."

Always set a maximum iteration cap. Even with a clear stop condition, hardware bugs, model errors, or external service flakes can cause infinite loops. Cap at N rounds and surface a "stopped early" signal if the cap is hit.


Part III — Use Cases

Chapter 10 — Migrations and Refactors

Large-scale code migrations benefit enormously from workflows because they decompose naturally into independent parallel units (files, modules, callsites).

The Bun/Rust pattern (as used in the real Bun Zig→Rust rewrite):

  1. Enumerate all units of work (callsites, failing tests, modules)
  2. Spawn a subagent for each unit in its own worktree
  3. Each agent makes its fix
  4. Adversarial review agent checks each change
  5. Merge approved changes

Key performance tip: Instruct agents to avoid resource-intensive commands (large grep, full rebuilds) so you can maximize parallelization without exhausting machine resources.

Example prompt:

"Use a workflow to rename our User model to Account everywhere."

Migration checklist:

StepOwner
Enumerate callsitesOrchestrator (deterministic JS)
Generate per-callsite fixWorker agent (Sonnet usually sufficient)
Verify each fix compiles + tests passVerifier agent
Resolve cross-callsite conflictsSynthesizer agent (Opus)
Final integration checkSingle Opus pass over merged result

Chapter 11 — Deep Research

Claude Code's built-in /deep-research skill is itself a dynamic workflow. It demonstrates the fan-out-and-synthesize pattern applied to research:

  1. Fan out: Run N web searches in parallel
  2. Fetch: Pull source content for each result
  3. Verify: Adversarially verify claims from each source
  4. Synthesize: Merge findings into a cited report with a barrier step

Beyond web search — research from internal sources:

  • Mine Slack channels for status patterns, incident trends
  • Explore how a feature works by fanning out across the codebase
  • Compile reports from JIRA/Linear tickets in parallel

Citation discipline: Force every claim in the synthesizer output to carry a citation back to the source that produced it. The synthesizer drops any unverifiable claim rather than smoothing it over.

Chapter 12 — Deep Verification

The inverse of research: you have a document and want to verify every factual claim in it.

Workflow pattern:

  1. Agent 1: Scan document, extract all factual claims as structured list
  2. Fan out: One agent per claim — verify each independently
  3. Verifier agents: Check that each source is high-quality (not circular references, not outdated)
  4. Synthesize: Report verified, unverified, and wrong claims

Example prompt:

"Go through my blog post draft and using a workflow verify every technical claim
against the codebase, I don't want to ship anything wrong."

Output format:

ClaimStatusSourceNotes
"Our API supports HTTP/2"Verifiedsrc/server/http.ts:42confirmed
"Latency is under 100ms"Unverifiedno benchmark in repo
"We use Postgres 15"Wrongdocker-compose.yml shows 14needs correction

Chapter 13 — Sorting at Scale

Sorting 1000+ items by qualitative measurement (bug severity, resume quality, support ticket priority) degrades badly in a single context window — the list doesn't fit and quality collapses.

Workflow approaches:

ApproachBest ForHow
TournamentSmall-medium lists (<200), taste decisionsPairwise comparisons, bracket-style
Parallel bucket-rank then mergeLarge lists (1000+)Fan out into buckets, rank within each, merge ranks
Pairwise pipelineAny size, highest accuracyEach comparison is its own agent

Key insight: Each comparison is its own agent — the deterministic loop holds the bracket, and only the running order stays in any agent's context. Agents never see the full list; they only see two items at a time.

Bucket-rank merge example:

1000 resumes
  ├─> Bucket A (250) → ranked locally → [A1..A250]
  ├─> Bucket B (250) → ranked locally → [B1..B250]
  ├─> Bucket C (250) → ranked locally → [C1..C250]
  └─> Bucket D (250) → ranked locally → [D1..D250]
        merge-rank agent → final ranking

Chapter 14 — Memory and Rule Adherence

Problem: Even rules in CLAUDE.md get missed or misapplied, especially as context grows.

Solution — one verifier per rule:

  1. Create a workflow with an explicit list of rules
  2. Spawn one verifier agent per rule
  3. Each verifier checks ONLY its assigned rule — focused, no distraction
  4. Skeptic agent reviews the verifiers to prevent false positives

Mining rules from sessions (reverse direction):

  1. Read last N sessions (fan out)
  2. Cluster corrections you keep making (parallel clustering agents)
  3. For each candidate rule: adversarially verify — would this rule have prevented a real mistake?
  4. Distill survivors back into CLAUDE.md

Example prompt:

"Using a workflow, go through my last 50 sessions and mine them for corrections
I keep making and turn the recurring ones into CLAUDE.md rules"

Chapter 15 — Root-Cause Investigation

Problem: Single-context debugging leads to self-preferential bias — Claude formed a hypothesis early and now cherry-picks evidence to support it.

Solution — structurally separated evidence streams:

  1. Fan out: Separate agents for different evidence sources (logs agent, code agent, data agent)
  2. Each agent generates hypotheses independently from its evidence slice
  3. Panel of verifier + refuter agents challenges each hypothesis
  4. Synthesizer consolidates surviving hypotheses

This is not just for code:

  • Sales analytics: "Why did sales drop in March?" — separate agents for marketing data, product changes, competitive moves, economic factors
  • Data engineering: "Why did this pipeline fail?" — separate agents for each system component
  • Post-mortem exercises: Any root-cause analysis benefits from structurally independent investigation

Hypothesis lifecycle:

StageAgentOutput
Evidence gatherLogs / Code / Data agents (parallel)Structured findings per source
Hypothesis formPer-source theoristCandidate root causes
Adversarial testRefuter agentPass / refuted / inconclusive
ConsolidateSynthesizerRanked surviving hypotheses

Chapter 16 — Triaging at Scale

Every team has a queue (support tickets, bug reports, PR reviews) that can't be fully processed by humans.

Triage workflow pattern:

  1. Classify each item by category, severity, and urgency
  2. Deduplicate against already-tracked items
  3. Route: attempt automated fix, escalate to human, or archive

Quarantine pattern (security-critical): Agents that READ untrusted public content are not allowed to take high-privilege actions. A separate privileged agent acts on the information after the reading agent summarizes it. This prevents prompt injection from untrusted content from triggering destructive actions.

Continuous triage: Pair with /loop to run triage at regular intervals — the workflow runs, processes the queue, then sleeps until the next cycle.

Example prompt:

"Use a workflow to dig through #incidents in Slack for the past six months
and find recurring root causes where nobody has filed a ticket."

Chapter 17 — Exploration and Taste

Taste-based decisions (naming, design, copy, architecture style) benefit from:

  • Many candidate options (generate-and-filter)
  • Structured evaluation against a rubric
  • Tournament-style selection to converge on the best

Pattern:

  1. Generate N candidates
  2. Give review agent a rubric for what "good" looks like
  3. Review agent iterates until rubric criteria are met, or tournament selects the winner

Example prompts:

"Take my business plan and run a workflow where different agents tear it apart
from an investor's, a customer's, and a competitor's perspective."
"I need a name for this CLI tool. Use a workflow to brainstorm a bunch of options
and run a tournament to pick the top 3."

Chapter 18 — Evals

Build lightweight evals for your own skills, prompts, or code:

  1. Spin off N agents in isolated worktrees — each runs the thing being evaluated
  2. Comparison agents grade outputs pairwise against a rubric
  3. Aggregate grades to produce a ranked result
  4. Optionally: use the lowest-scoring outputs to refine the evaluated item and re-run

When to use: When you've built a skill or feature and want to measure quality before shipping, or when you want to continuously monitor quality as you make changes.

Eval pipeline:

StepPurpose
Define rubricConcrete criteria the output must satisfy
Generate test inputsN representative scenarios
Run candidateEach worktree runs the target skill/prompt
Pairwise gradeComparison agent picks better of each pair
RankAggregate pairwise wins into a ranking
RefineUse the worst outputs to improve the target

Chapter 19 — Model and Intelligence Routing

Not every subtask needs Opus. A well-designed workflow routes subtasks to the appropriate model:

Task TypeRecommended ModelReason
Classification, routing decisionsHaikuFast, cheap, sufficient for binary decisions
Standard implementation, searchSonnetGood balance of speed and capability
Complex reasoning, synthesis, reviewOpusMaximum capability for judgment tasks

Classifier agent pattern:

  1. Classifier agent researches the task (reads relevant files, counts lines, assesses complexity)
  2. Based on findings, routes to Sonnet or Opus for the actual work
  3. Result: expensive Opus tokens used only where they're needed

Example: "Explain how the auth module works" — the right model depends on how many files are in the auth module. Classifier reads the file tree, picks the model, then that model does the explanation.

Routing heuristics:

  • If the work fits in <10k tokens and has a clear right answer → Haiku
  • If the work needs to read multiple files and write code → Sonnet
  • If the work needs judgment, synthesis, or adversarial review → Opus

Part IV — Practical Usage

Chapter 20 — Triggering Workflows

Two ways to start a workflow:

  1. Natural language: "Use a workflow to..." / "Set up a workflow that..." / "Build a workflow for..."
  2. Ultracode trigger word: Typing ultracode in your prompt guarantees Claude Code builds a workflow rather than attempting the task inline

Quick workflows: Workflows aren't only for large tasks. Prompt for "quick workflow" to get a small adversarial check, assumption validation, or fast parallel search.

Detailed prompting gets better workflows: The more specific you are about what pattern you want (fan-out, tournament, adversarial verification), the better the generated workflow. Claude is intelligent enough to implement any of the six patterns if you name them.

Trigger phrase reference:

You typeWhat happens
"Use a workflow to..."Claude proposes and builds a workflow
"Set up a workflow that..."Same, often slightly more interactive
"Build a quick workflow..."Lightweight workflow, small budget
"ultracode <task>"Guaranteed workflow path, no inline attempt

Chapter 21 — Prompting Strategies

Name the pattern explicitly:

"Use a fan-out-and-synthesize workflow to..."
"Build an adversarial verification workflow that..."
"Run a tournament workflow to decide between..."

Specify models where it matters:

"Use Haiku agents for the classification step and Opus for the final synthesis"

Set stop conditions clearly:

"...don't stop until at least one hypothesis survives adversarial review"

Specify parallelism constraints:

"Don't use resource-intensive commands so we can parallelize maximally"

Combine with /goal for hard completion requirements:

"Use a workflow to fix all lint errors /goal zero lint errors remaining"

Combine with /loop for recurring workflows:

"Set up a triage workflow for new GitHub issues, then /loop to run it hourly"

Prompting checklist:

  • Did you name the pattern (fan-out, tournament, etc.)?
  • Did you specify the stop condition or completion criteria?
  • Did you set a token or time budget?
  • Did you route models appropriately (Haiku/Sonnet/Opus)?
  • Did you call out any invariants ("don't touch X")?

Chapter 22 — Token Budgets

Dynamic workflows use more tokens than single-context execution — sometimes significantly more. Each spawned agent uses its own context.

Setting explicit budgets:

"Use a workflow for this, budget 10k tokens"
"Build a quick workflow, keep it under 5k tokens total"

Cost optimization strategies:

  • Use cheap models (Haiku) for classification and routing steps
  • Set low token budgets for each individual agent where appropriate
  • Limit parallelism when running many agents simultaneously
  • Use "quick workflow" for small tasks rather than heavy orchestration
  • Ask yourself: does this task truly benefit from multiple agents, or is it a standard coding task that the default harness handles well?

When workflows are NOT worth the token cost:

  • Standard coding tasks (write a function, fix a bug, add a test)
  • Tasks that fit comfortably in one context window
  • Tasks that don't benefit from parallelism or adversarial review
  • Quick one-off changes

Budget reference rules of thumb:

Workflow scaleTypical token spendWhen
Quick check2k–5kSingle adversarial pass, validation
Standard fan-out20k–80k5–20 agents on a focused task
Deep research100k–500kMany sources, deep verification
Large migration500k+Hundreds of worktrees, full review pass

Chapter 23 — Saving and Sharing Workflows

Saving a workflow:

  1. Press s in the workflow menu while a workflow is running
  2. Workflow is saved to ~/.claude/workflows/

Sharing via skills:

  1. Put the JavaScript workflow files in your skill folder
  2. Reference them in SKILL.MD
  3. Distribute the skill (GitHub, npm, Claude Code skills marketplace)

Using shared workflows as templates:

When receiving a workflow from someone else, prompt Claude to treat it as a template rather than a verbatim script:

"Use this workflow as a template but adapt it for my specific codebase structure"

This allows flexible reuse without hard-coded assumptions from the original context.

Versioning shared workflows:

  • Treat workflows like code — commit them, review them, tag releases
  • Document the input contract (what the workflow expects) and output contract (what it produces)
  • Include a sample input/output pair in the skill folder for grounding

Chapter 24 — Combining with /loop and /goal

/loop integration: Pair repeatable workflows (triage, monitoring, rule-checking) with /loop to run at regular intervals:

"Set up a triage workflow for the #bugs Slack channel, pair with /loop to check every 4 hours"

/goal integration: Set a hard stop condition that the workflow must achieve:

"Use a workflow to find and fix flaky tests /goal all tests passing reliably over 100 runs"

Combined pattern: /loop drives the cadence; /goal defines when the loop can stop.

Practical recipes:

GoalRecipe
Hourly issue triageworkflow + /loop 1h
Fix every flaky test until greenworkflow + /goal all tests green
Daily CLAUDE.md rule auditworkflow + /loop 24h
Migrate until zero callsites leftworkflow + /goal zero remaining callsites

Part V — Reference

Chapter 25 — Pattern Selection Guide

You want to...Use this pattern
Handle heterogeneous inputs differentlyClassify-and-Act
Process N items independently in parallelFan-Out-and-Synthesize
Check work for errors or missesAdversarial Verification
Generate many options then pick bestGenerate-and-Filter or Tournament
Make a taste-based decisionTournament
Run until something is doneLoop Until Done
Combine multiple patternsCompose them — patterns are composable

Composition example: A migration is Fan-Out (per file) + Adversarial Verification (per change) + Loop Until Done (until zero remaining callsites) + Classify-and-Act (route trivial vs complex callsites to different models).

Chapter 26 — Use Case Quick Reference

Use CasePrimary PatternExample
Large migration/refactorFan-out + AdversarialRename User to Account everywhere
Deep researchFan-out + SynthesizeResearch Slack incidents, find patterns
Claim verificationFan-out + AdversarialVerify every claim in a blog post
Sorting at scaleTournament or Bucket-rankRank 80 resumes for a role
CLAUDE.md rule miningFan-out + AdversarialMine last 50 sessions for rules
Root-cause analysisFan-out + AdversarialDebug flaky test across N hypotheses
Continuous triageLoop + Classify-and-ActProcess support queue hourly
Design/namingGenerate-and-Filter + TournamentPick top 3 names for CLI tool
EvalsFan-out + Pairwise comparisonGrade skill outputs against rubric
Model routingClassify-and-ActRoute auth questions to Opus, trivial to Haiku

Chapter 27 — Failure Mode Reference

Failure ModeWhat HappensWorkflow Fix
Agentic lazinessClaude stops at 40/50 items, says "done"Each agent has a fixed, bounded scope; completion is measurable
Self-preferential biasClaude grades its own work too generouslySeparate verifier agents have no access to the worker's reasoning process
Goal drift"Don't touch auth" forgotten after compactionEach agent's goal is explicit in its spawn prompt; loop context holds invariants

Diagnostic questions when a workflow goes wrong:

  • Did the agent's spawn prompt include the invariant that was violated?
  • Was the stop condition measurable, or fuzzy?
  • Did the verifier have independent access to the source of truth, or was it just looking at the worker's output?
  • Was the task split fine enough that each agent could finish within its budget?

Chapter 28 — Example Prompt Gallery

Debugging:

"This test fails maybe 1 in 50 runs. Set up a workflow to reproduce it, form theories
and adversarially test them in worktrees /goal don't stop until one theory works."

Knowledge mining:

"Using a workflow, go through my last 50 sessions and mine them for corrections I keep
making and turn the recurring ones into CLAUDE.md rules"

Incident analysis:

"Use a workflow to dig through #incidents in Slack for the past six months and find
recurring root causes where nobody has filed a ticket."

Adversarial review:

"Take my business plan and run a workflow where different agents tear it apart from
an investor's, a customer's, and a competitor's perspective."

Structured hiring:

"Here's a folder of 80 resumes, use a workflow to rank them for the backend role and
double-check the top ten. Interview me using the AskUserQuestion tool for a rubric."

Naming with tournament:

"I need a name for this CLI tool. Use a workflow to brainstorm a bunch of options
and run a tournament to pick the top 3."

Large refactor:

"Use a workflow to rename our User model to Account everywhere."

Technical fact-checking:

"Go through my blog post draft and using a workflow verify every technical claim
against the codebase, I don't want to ship anything wrong."

Closing Notes

Dynamic workflows shift the unit of work in Claude Code from "a single Claude turn" to "an orchestrated program of Claude turns." The orchestration is deterministic JavaScript; the intelligence inside each step is a focused Claude agent with its own context.

The most important skill is recognizing when a task wants a workflow rather than a single turn. The three failure modes — agentic laziness, self-preferential bias, and goal drift — are your signals. When you see them coming, reach for the right pattern from the six core patterns and compose as needed.

When in doubt: start with a quick workflow, set a clear stop condition, name the pattern explicitly, route models by intelligence required, and adversarially verify anything that matters.

Add to library to read more

Table of Contents

Add to Library

Free · Live updates included

2 readers subscribed