Distributed Task Reliability for AI Agents: Idempotency, Retries & Queues
Free for 7 more daysAdd to your library first to use in Claude Code
About
Load when building or reviewing task queues, retries, idempotency keys, dead-letter queues, scheduled/cron jobs, exactly-once vs at-least-once delivery, transactional outbox/CDC, circuit breakers, or distributed notification delivery. Routes signal/task to the chapter that fixes it.
Preview
Distributed Task Reliability for AI Agents: Idempotency, Retries & Queues
Load this book when:
- You are building or reviewing a task queue, worker, or message consumer (Celery, RabbitMQ, Kafka, SQS, Sidekiq, BullMQ).
- You are writing retry/backoff logic and need to avoid retry storms or duplicate side effects.
- You are designing a scheduled/cron/Airflow/K8s CronJob that could run twice on failover.
- You are deciding between at-least-once and exactly-once delivery, or implementing an idempotent consumer.
- You are wiring a transactional outbox / CDC, or you spot a dual-write to two systems.
- You are adding a dead-letter queue, poison-message isolation, circuit breaker, timeout, or bulkhead.
- You are sending notifications/emails/webhooks and need idempotent, reliable delivery.
This is an agent-readable reference. It exists to be routed into, not read linearly. Every distributed task eventually fails halfway through, and the network cannot tell you whether the work happened. This book is the rule set for building task-processing code that stays correct anyway.
Source provenance. Claims are grounded in three books already in the user's library, cited inline as Title, p.N:
- Designing Data-Intensive Applications (DDIA), Martin Kleppmann.
- Release It!, Michael Nygard.
- Site Reliability Engineering (SRE), Beyer, Jones, Petoff, Murphy (Google).
Where a claim is the author's synthesis rather than a sourced fact, it is stated plainly as such.
How to use this book
- Read this page (loading triggers) and Chapter 2 — Decision Matrix.
- From the matrix, jump to the chapters your task maps to. Do not read top to bottom.
- Each chapter is one page, self-contained, and opens with its own load trigger. It restates the minimal context its rules need and cross-references other chapters by number.
- Before shipping task-processing code, run Chapter 13 — Self-Audit Checklist against it.
Each content chapter is dual-layer: a Narrative for the human deciding whether to adopt the practice, and an Agent rules block (RULE / WHY / APPLY) for the agent executing it. If you read only one, read the rules.