Message Queues, Brokers, and Streams
The Problem: Synchronous Work in Request Handlers
User places an order. The API handler needs to: 1. Save the order to DB 2. Charge payment 3. Send confirmation email 4. Update inventory 5. Notify warehouse
If done synchronously, the user waits 5-10 seconds. If step 4 fails, steps 1-3 already ran. The user only needs to know "order placed" -- the rest can happen later.
Delegation: hand off work to be done later, by someone else.
Without delegation:
User → API: do steps 1-5 → wait... wait... → "done"
With delegation:
User → API: do step 1, hand off 2-5 → "done" (fast)
↓
workers pick up 2-5 in the background
Why Not Just Background Threads?
Two problems: 1. Durability -- if the app server crashes, in-memory tasks are lost forever 2. Scale -- 100 users × 5 background tasks = resource exhaustion
A queue is external to the application. Work survives crashes.
Background thread: app memory: [task1, task2] → crash → all lost
Queue: app → writes to Queue → crash → tasks still in queue → another worker picks up
Terminology
- Broker = the system that sits between producers and consumers (RabbitMQ, Kafka, SQS)
- Queue = a delivery pattern (consume and delete)
- Stream = a delivery pattern (consume and retain)
A broker can implement queues, streams, or both.
Without broker (tightly coupled):
Producer → Consumer A
Producer → Consumer B (producer must know all consumers)
With broker (decoupled):
Producer → Broker → Consumer A
→ Consumer B (producer doesn't care who consumes)
graph LR
P1[Producer 1] --> B[Broker]
P2[Producer 2] --> B
B --> Q1[Queue A]
B --> Q2[Queue B]
Q1 --> C1[Consumer A]
Q2 --> C2[Consumer B]
Q2 --> C3[Consumer C]
Queue Model
Messages are deleted once consumed. One message → one consumer.
Acknowledgments (Acks)
Queues don't just pop messages. They use ack-based delivery:
- Worker receives message (message becomes invisible to others, NOT deleted)
- Worker processes it
- Worker sends ack ("done, delete it")
If worker crashes before acking, the queue waits for a visibility timeout, then makes the message visible again for another worker.
Worker 1: receives msg1 → msg1 invisible → crash!
(timeout expires)
→ msg1 visible again
Worker 2: receives msg1 → processes → ack → msg1 deleted
sequenceDiagram
participant P as Producer
participant Q as Queue
participant W as Worker
P->>Q: enqueue(msg)
Q-->>W: deliver msg (msg now invisible)
W->>W: process msg
alt success
W->>Q: ack
Q->>Q: delete msg
else crash / timeout
Q->>Q: visibility timeout expires
Q->>Q: msg visible again
Q-->>W: redeliver to another worker
end
stateDiagram-v2
[*] --> Queued: producer enqueues
Queued --> InFlight: delivered to worker
InFlight --> Acked: worker sends ack
InFlight --> Queued: timeout (no ack)
Acked --> [*]: deleted from queue
Fan-Out (Multiple Services)
If multiple services need the same message, each gets its own queue:
┌──→ Queue A → Order Service
Producer → Broker ┤
├──→ Queue B → Email Service
└──→ Queue C → Analytics Service
RabbitMQ supports this with "exchanges" that copy messages into multiple queues.
Tools: RabbitMQ, Amazon SQS, Redis (with lists), Azure Service Bus
Stream Model
Messages stay after consumption. Consumers track their own position (offset). Multiple consumers read the same messages independently. Any consumer can rewind.
[msg1, msg2, msg3, msg4, msg5, msg6, ...]
↑ ↑
Service A (offset 4) Service B (offset 6)
↑
New Service C (offset 0) -- reads everything from the start
No fan-out configuration needed. New service just starts reading the same topic.
graph LR
P[Producer] --> L[Append-Only Log]
L --- O0["offset 0: msg1"]
L --- O1["offset 1: msg2"]
L --- O2["offset 2: msg3"]
L --- O3["offset 3: msg4"]
L --- O4["offset 4: msg5"]
L --- O5["offset 5: msg6"]
O2 -.->|"reading here"| GA["Service A (offset 2)"]
O4 -.->|"reading here"| GB["Service B (offset 4)"]
O0 -.->|"replay from start"| GC["Service C (offset 0)"]
Tools: Apache Kafka, Amazon Kinesis, Redpanda, Apache Pulsar
Queue vs Stream -- When to Use Which
| Aspect | Queue (RabbitMQ, SQS) | Stream (Kafka) |
|---|---|---|
| Message consumed once? | Yes (per queue) | No, stays around |
| New consumer needs history? | Gone | Start from any offset |
| Multiple services, same message? | Fan-out (separate Qs) | Same topic |
| Simple task delegation? | Great fit | Overkill |
| Event sourcing / audit trail? | Not designed for it | Perfect |
Delivery Guarantees
At-Most-Once
Queue deletes message on handover. No ack. If worker crashes, message is lost. Fast but lossy.
sequenceDiagram
participant Q as Queue
participant W as Worker
Q->>W: deliver msg (delete immediately)
W->>W: processing...
W--xW: crash!
Note over Q,W: msg is lost -- queue already deleted it
At-Least-Once
Queue redelivers if no ack received (visibility timeout). Never loses messages, but worker might process the same message twice.
sequenceDiagram
participant Q as Queue
participant W1 as Worker 1
participant W2 as Worker 2
Q->>W1: deliver msg (keep in queue, invisible)
W1->>W1: processing...
W1--xW1: crash!
Note over Q: visibility timeout expires
Q->>W2: redeliver msg
W2->>W2: process msg
W2->>Q: ack
Note over Q,W2: msg processed (possibly twice if W1 partially succeeded)
Exactly-Once
True exactly-once is impossible between two independent systems. The queue can't know if the worker finished but the ack was lost, or if the worker genuinely crashed.
Solution: at-least-once delivery + idempotent consumers = effectively exactly-once.
When Kafka claims "exactly-once semantics," it handles idempotency tracking internally.
Idempotency Patterns
Since at-least-once delivery means duplicates are possible, workers must be idempotent:
NOT idempotent (dangerous):
"Add $50 to balance" → processed twice = $100
Idempotent (safe):
"Set order #123 status to SHIPPED" → twice = still SHIPPED
"Charge with idempotency_key=abc" → payment provider rejects duplicate
Common patterns: - Idempotency keys -- unique ID per message, check before processing - Upserts -- "insert or update if exists" instead of "insert" - State transitions -- "set status to X" not "increment counter"
flowchart TD
A[Receive message] --> B{Seen this idempotency key before?}
B -->|No| C[Process message]
C --> D[Store idempotency key in DB]
D --> E[Send ack]
B -->|Yes| F[Skip processing]
F --> E