CandleKeep

Distributed Task Reliability for AI Agents: Idempotency, Retries & Queues

Free for 7 more days
distributed-systemsreliabilityidempotencyretriesmessage-queuesdead-letter-queuesexactly-oncetransactional-outboxcircuit-breakercron-jobsbackendagent-reference
Pages13
Formatmarkdown
ListedJune 16, 2026
UpdatedJune 15, 2026
Subscribers1

About

Load when building or reviewing task queues, retries, idempotency keys, dead-letter queues, scheduled/cron jobs, exactly-once vs at-least-once delivery, transactional outbox/CDC, circuit breakers, or distributed notification delivery. Routes signal/task to the chapter that fixes it.

13Chapters
37Topics
13Pages

Preview

Distributed Task Reliability for AI Agents: Idempotency, Retries & Queues

Load this book when:

  • You are building or reviewing a task queue, worker, or message consumer (Celery, RabbitMQ, Kafka, SQS, Sidekiq, BullMQ).
  • You are writing retry/backoff logic and need to avoid retry storms or duplicate side effects.
  • You are designing a scheduled/cron/Airflow/K8s CronJob that could run twice on failover.
  • You are deciding between at-least-once and exactly-once delivery, or implementing an idempotent consumer.
  • You are wiring a transactional outbox / CDC, or you spot a dual-write to two systems.
  • You are adding a dead-letter queue, poison-message isolation, circuit breaker, timeout, or bulkhead.
  • You are sending notifications/emails/webhooks and need idempotent, reliable delivery.

This is an agent-readable reference. It exists to be routed into, not read linearly. Every distributed task eventually fails halfway through, and the network cannot tell you whether the work happened. This book is the rule set for building task-processing code that stays correct anyway.

Source provenance. Claims are grounded in three books already in the user's library, cited inline as Title, p.N:

  • Designing Data-Intensive Applications (DDIA), Martin Kleppmann.
  • Release It!, Michael Nygard.
  • Site Reliability Engineering (SRE), Beyer, Jones, Petoff, Murphy (Google).

Where a claim is the author's synthesis rather than a sourced fact, it is stated plainly as such.

How to use this book

  1. Read this page (loading triggers) and Chapter 2 — Decision Matrix.
  2. From the matrix, jump to the chapters your task maps to. Do not read top to bottom.
  3. Each chapter is one page, self-contained, and opens with its own load trigger. It restates the minimal context its rules need and cross-references other chapters by number.
  4. Before shipping task-processing code, run Chapter 13 — Self-Audit Checklist against it.

Each content chapter is dual-layer: a Narrative for the human deciding whether to adopt the practice, and an Agent rules block (RULE / WHY / APPLY) for the agent executing it. If you read only one, read the rules.

Add to library to read more

Table of Contents

Chapter 2 — Decision Matrix

Add to Library

Free · Live updates included

1 reader subscribed