Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.
Queues vs streams: the fundamental split
PaySetu's payments team ran a single RabbitMQ cluster for three years. Every event in the company — transaction-attempted, transaction-settled, fraud-flagged, refund-issued — landed on a queue, got picked up by one of N workers, was processed, and was acknowledged. The queue then deleted the message. The architecture was simple, the latency was good, and nobody complained until the analytics team showed up. Analytics wanted to count fraud-flagged events per merchant per hour. The data was already gone — RabbitMQ had handed each message to one fraud worker, that worker had updated a Postgres row, and the message itself was deleted. To get analytics, the team had to add a second consumer per queue. Then a third — for the data lake. Then a fourth — for the ML feature store. Each new consumer needed its own queue, its own fanout exchange, its own bookkeeping. By month six the broker config was 4,000 lines of YAML and the team had reinvented, badly, the thing they should have used in the first place: a stream.
The split between queues and streams is the single most consequential decision in event-driven architecture, and almost every team gets it wrong on the first build. The mistake is symmetric — teams pick a queue when they need a stream and end up with a fanout-exchange tarpit, or pick a stream when they need a queue and end up paying log-storage costs to model a job board. This chapter is about why the two primitives are genuinely different, why you cannot fake one with the other beyond a certain scale, and how to recognise which one your workload wants before you have shipped 4,000 lines of YAML.
A queue is a work-distribution primitive: each message is delivered to one consumer in a group and then deleted. A stream is a replayable log: each message is appended to an ordered log and every consumer reads it at its own offset, with no deletion until a retention policy fires. Queues optimise for throughput-per-message and bounded memory; streams optimise for fanout, replay, and historical analysis. RabbitMQ, SQS, and ActiveMQ are queues. Kafka, Pulsar, Redpanda, and Kinesis are streams. The two cannot be substituted for each other beyond toy scale — pick wrong and you will end up rebuilding the missing primitive on top.
The two primitives, mechanically
Strip both systems to their essentials and they look like two different data structures behind a network API. A queue is a multi-producer, multi-consumer FIFO with destructive read — pop() removes the message. A stream is a multi-producer, multi-consumer append-only log with non-destructive read — read(offset) returns the message at that offset and leaves it in place. Everything else — partitioning, persistence, durability, ordering — is layered on top of these two cores.
The destructive-read property of a queue is the source of every queue feature you have used. Acks and redelivery exist because if the consumer crashes after popping but before processing, the message must be returned to the queue. Visibility timeouts exist for the same reason. Dead-letter queues exist to absorb messages that cannot be processed. Every queue feature is a defence around the moment the message is removed from the queue and the consumer is solely responsible for it.
The non-destructive-read property of a stream is the source of every stream feature. Consumer offsets exist because the broker no longer tracks "who has consumed what" — the consumer tracks its own position. Retention policies exist because messages are not deleted on read; something else has to delete them eventually. Compacted topics exist because some streams want to keep only the latest value per key and let the broker garbage-collect older versions. Every stream feature is a consequence of "the broker holds the data, the consumer holds the cursor".
Why this distinction is structural and not an implementation detail: in a queue, ownership of the message transfers to the consumer at pop time. In a stream, ownership of the message stays with the broker forever — the consumer only holds a position. This changes the failure model. A queue must implement at-least-once via ack/redeliver because the message has moved; if the consumer dies, only the broker's retry timer can resurrect it. A stream gets at-least-once almost for free because the consumer can simply re-read from its last committed offset; the message never moved. The same difference makes streams trivially fan-outable (every consumer has its own offset, no coordination needed) and queues trivially load-balanceable (the broker hands each message to whichever worker is free). You cannot have both properties from the same data structure — they are mutually exclusive consequences of "where does the message live after a read".
What each one is genuinely good at
Queues win when the workload is work distribution with no replay: a job arrives, exactly one worker should do the work, the work is done, the job is forgotten. PaySetu's refund-processing queue is this shape. A refund is requested, one of N refund workers picks it up, calls the bank API, updates the ledger, acks. If a worker dies mid-refund, the broker redelivers to another worker. There is exactly one refund per request, and once the refund is in the ledger nobody needs the original message again. Queues are also wonderful for backpressure — if work piles up, the queue grows, producers feel pushback, the system gracefully throttles.
Streams win when the workload is fanout, replay, or historical analysis: an event happens, many consumers care about it, and they care for different reasons at different paces. CricStream's match-event topic is this shape. When a wicket falls during a cricket match, the score-card service updates the live UI, the notification service pings 25M viewers' phones, the analytics service updates per-match aggregates, the betting-odds service recomputes prices, and the data lake archives the event for next-month's commentary highlights. Five consumers, five offsets, one log. If a new service launches next quarter — say, a fan-engagement service that ranks comments — it can replay the last 30 days of match-event data from offset 0 and bootstrap its own state. Try doing that with a queue.
The crossover case — and the one most teams mishandle — is when a workload that looks like work distribution turns out to need replay later. KapitalKite's order-routing service is the canonical example. Each order is a job: pick the best exchange, route it, ack. Sounds like a queue. Six months in, compliance asks for a way to replay the last 90 days of orders to demonstrate best-execution. The queue is gone — every order was acked and deleted on processing. The team rebuilds the pipeline on Kafka. This is the silent cost of picking wrong: not a 10% performance hit, but a six-month architecture rewrite.
Why "I might need replay later" is not a reason to default to streams: storage is not free. A stream that retains 7 days of data at PaySetu's 50K-events-per-second peak with 1KB messages stores ~30 TB across replicas — a real per-month cloud bill. A pure work-queue workload with average in-flight time of 200 ms stores around 10 GB, three orders of magnitude less. The right framing is "is replay or fanout a real requirement, or am I imagining one?" If the answer is "real" — analytics already exists, more than one consumer is already coupled to the data — pick a stream. If the answer is "maybe someday", the cheaper-to-migrate-later move is often a queue today plus a CDC stream off the system-of-record database when the second consumer materialises. The stream cost is real and proportional to retention × throughput; do not pay it preemptively.
A code walkthrough — the same workload, both primitives
To make the difference concrete, here is the same fraud-detection workload built two ways. The first version uses a queue (queue.Queue standing in for RabbitMQ/SQS); the second uses a stream (a list with offsets standing in for Kafka). Watch what changes when a second consumer — analytics — needs the same data.
# Same workload — fraud detection on payment events — both primitives.
import queue, threading, time, collections
# ---------- VERSION 1: QUEUE ----------
def queue_demo():
q = queue.Queue()
fraud_results = []
def fraud_worker():
while True:
evt = q.get()
if evt is None: return
risky = evt["amount"] > 50000
fraud_results.append((evt["id"], "FLAG" if risky else "OK"))
q.task_done()
t = threading.Thread(target=fraud_worker); t.start()
for i, amt in enumerate([100, 80000, 500, 60000, 200], start=1):
q.put({"id": f"txn-{i}", "amount": amt})
q.put(None); t.join()
print("QUEUE fraud:", fraud_results)
# Now analytics shows up wanting to count txns/hour. Data is GONE.
print("QUEUE analytics-able-to-recover-history?", False)
# ---------- VERSION 2: STREAM ----------
def stream_demo():
log = [] # the append-only log
def append(evt): log.append(evt)
def consume_from(offset, fn):
results = []
while offset < len(log):
results.append(fn(log[offset])); offset += 1
return results, offset
for i, amt in enumerate([100, 80000, 500, 60000, 200], start=1):
append({"id": f"txn-{i}", "amount": amt})
fraud_fn = lambda e: (e["id"], "FLAG" if e["amount"] > 50000 else "OK")
fraud, _ = consume_from(0, fraud_fn)
# Analytics shows up later — independent offset, replays from 0.
bucket = collections.Counter()
def analyse(e): bucket[e["amount"] > 50000] += 1; return e["id"]
_, _ = consume_from(0, analyse)
print("STREAM fraud:", fraud)
print("STREAM analytics: high-value=%d, normal=%d" % (bucket[True], bucket[False]))
print("STREAM analytics-able-to-recover-history?", True)
queue_demo(); stream_demo()
Realistic output:
QUEUE fraud: [('txn-1', 'OK'), ('txn-2', 'FLAG'), ('txn-3', 'OK'), ('txn-4', 'FLAG'), ('txn-5', 'OK')]
QUEUE analytics-able-to-recover-history? False
STREAM fraud: [('txn-1', 'OK'), ('txn-2', 'FLAG'), ('txn-3', 'OK'), ('txn-4', 'FLAG'), ('txn-5', 'OK')]
STREAM analytics: high-value=2, normal=3
STREAM analytics-able-to-recover-history? True
Walk through the load-bearing pieces. q.get() plus q.task_done() is the destructive-read pattern: the message is gone from the queue the moment a worker picks it up. Add a second consumer (analytics_worker) and it would compete for messages with fraud_worker, halving each one's input — the queue cannot fan out without a separate copy of the queue. log.append is the non-destructive-read pattern: the message stays in the log forever (until retention kicks in). consume_from(0, ...) is offset-based reading — each consumer specifies where to start, and the log is unchanged after the read. Adding analytics is a one-line change; analytics replays from offset 0 and produces its result independently. The stream version's analytics-able-to-recover-history? line is the punchline: history is retrievable, because nothing was deleted on the fraud worker's read. The same workload, the same ten messages, two completely different operational universes.
Beyond a certain scale you cannot fake one primitive with the other. You can put a queue into a stream by treating the consumer offset as a job marker — Kafka calls this a "consumer group" — but you lose the queue's clean ack/redeliver semantics for partial-failure work and gain a per-key locking problem. You can put a stream into a queue with fanout exchanges and one-queue-per-consumer — RabbitMQ supports this — but you lose retention, replay, and the ability to onboard a new consumer six months later without coordinating a backfill. The simulations work for prototypes; they do not work for production at PaySetu's 200k-events-per-second peak, because the broker's coordination overhead grows superlinearly in the number of synthetic per-consumer queues.
Why faking a queue on top of a stream specifically breaks down for partial-failure work: a queue's ack model lets a single message fail and be retried independently of every other message in the queue — the dead-letter path catches the one bad message, the rest of the queue keeps flowing. A consumer-group offset on a stream is a single integer per partition. If message at offset 73 fails permanently, the consumer cannot ack offset 74 and "skip" 73 — the offset is a watermark, not a set. Either the consumer blocks the partition forever, manually advances past the bad message (losing it), or writes 73 to a separate dead-letter topic and continues — and now the consumer is doing the queue's bookkeeping in application code, badly. Streams are a great primitive when the consumer can process messages in strict order with no per-message failure carve-outs; they are a terrible primitive when failures are individual and isolatable. The reason every Kafka shop eventually builds a "DLT topic + manual replay tool" is that the stream primitive does not naturally model isolated per-message failure.
A decision procedure that actually works in design review
The reason teams pick wrong is that "queue or stream?" gets argued at the level of broker brand — "we already have RabbitMQ, let's use that" — instead of at the level of workload shape. A useful design-review procedure asks four questions in order, and the first "no" answer flips the default.
- Does more than one consumer need every message? If yes — analytics, audit, a second business service — you want a stream. A queue forces fanout to be implemented broker-side as multiple physical queues, and the bookkeeping cost compounds linearly with consumers. If no, the workload is single-consumer; continue.
- Is replay or backfill a likely future requirement? If yes — compliance, ML training, a future service that does not yet exist — lean stream. The retention period is the cheap insurance against future requirements. If no, continue.
- Are individual message failures isolatable and retried independently? If yes — calls to a flaky external API, per-customer side effects, refunds — lean queue. The ack/redeliver/dead-letter path is built-in. If no — every message must be processed in strict order, no skipping — lean stream.
- Is bounded broker memory and predictable storage cost a hard requirement? If yes — small operations team, no S3 budget for tiered storage — lean queue. Queues only store in-flight work. If no — storage is cheap, the team is comfortable operating retention policies — lean stream.
Two yes answers in opposite directions means the workload genuinely wants both: stream as system-of-record, queue as transient work-distribution scratchpad fed from the stream. That hybrid pattern is common enough at scale that production architects treat it as the default, not the exception. A workload that fits cleanly in one primitive is the easier case; a workload that wants both should not be forced into one.
The anti-pattern is answering these questions in the abstract, on paper, before the system has a real workload. The right time to revisit is the second time a new consumer needs to be added. The first new consumer is "we'll figure it out". The second new consumer is the moment a stream pays for itself, because the third, fourth, and fifth consumers — which you can now confidently predict — make the queue topology untenable. PaySetu's RabbitMQ-to-Kafka migration was triggered by exactly this signal: the analytics consumer was forgivable; the data-lake consumer was the second; the ML feature-store consumer was the third and the team rewrote the broker layer that quarter.
Common confusions
- "Kafka is a queue." It is a stream. Kafka's consumer group abstraction lets multiple consumers in the same group split partitions among themselves, which resembles queue semantics for that group. But the underlying log is still a stream — a second consumer group reading the same topic gets every message independently, which a queue cannot do.
- "RabbitMQ streams are the same as Kafka." RabbitMQ added a streams feature in 3.9 (2021) and it is genuinely a stream — append-only, offset-based, retention-driven. But RabbitMQ queues and RabbitMQ streams are different primitives in the same broker. The naming is unfortunate.
- "I can use Redis Pub/Sub as a queue." Redis Pub/Sub is fire-and-forget — there is no persistence, no ack, and no replay. If a subscriber is offline when a message is published, it is lost. Redis Streams (added in 5.0) is a different feature that is a real stream with offsets and persistence. Pub/Sub is not a queue and not a stream; it is a third thing.
- "At-least-once delivery means a queue." Queues and streams both deliver at-least-once. The difference is who tracks the cursor — the broker (queue) or the consumer (stream). Both can produce duplicates on consumer crash; the dedup strategy is the same — idempotent processing, dedup table, or transactional outbox.
- "A stream can always replace a queue." It can model the API, but the operational profile is different. Streams hold messages for the retention period, so storage costs scale with retention × throughput. A pure work queue with no replay need pays for storage only during the in-flight window. For a high-throughput, no-replay job board (a million jobs per minute, processed within seconds), a queue is dramatically cheaper.
- "My broker supports both, so the choice does not matter." It matters because the ergonomics push you toward one mental model. Pulsar supports queues and streams in one cluster, but the team that picks "queues by default" ships a different system from one that picks "streams by default", even on the same broker.
Going deeper
Why the LinkedIn paper called it "the log"
LinkedIn's Jay Kreps wrote a 2013 essay, The Log: What every software engineer should know about real-time data's unifying abstraction, that reframed messaging as logging. The essay's core claim was that the append-only log is the universal data integration primitive — every database is a log of writes, every replication system is a log shipper, every queue can be modelled as a log with destructive read. The argument behind Kafka's design was that log is the strictly more general abstraction; a queue is a log plus a forgetful read, and it is easier to add forgetfulness to a log than to add memory to a queue. Most modern stream brokers — Kafka, Pulsar, Redpanda, Kinesis — are direct heirs of this argument. The argument is correct in the abstract, but in practice the operational cost of the log primitive is high enough that pure work-queue workloads still benefit from a dedicated queue broker.
Partitions are how streams get parallelism
A single-partition stream is fundamentally serial — every consumer reads the same log in order, and parallelism requires multiple consumer processes to coordinate over offsets. Streams scale by partitioning: the topic is split into N partitions, each an independent log, and the producer routes each message to a partition by a key (typically a hash of the entity ID). Within a partition, order is preserved; across partitions, no order is guaranteed. Consumer groups assign partitions to consumers — each partition is read by exactly one consumer in the group at a time, giving N-way parallelism. The trade-off is that ordering is now per-key, not global. CricStream partitions the match-events topic by match_id; every event for one match is in the same partition and stays ordered. Events across two simultaneous matches can interleave in any order on the consumer side, which is fine because nothing depends on cross-match ordering. The partition key is the design decision that determines what stays ordered and what scales out — picking it wrong is one of the most expensive corrections to make later, because re-partitioning a busy topic in production is a multi-day operation.
Retention, compaction, and the "log is forever" myth
Streams do not actually keep messages forever. They keep them for the retention period, which is configurable per topic — 7 days is a common default. After retention, segments are deleted. There is also log compaction, a per-key garbage-collection mode where the broker keeps only the most recent message per key and deletes older ones. Compaction is what makes a stream usable as a materialised view — the log of "current balance per account" can be replayed to bootstrap a fresh consumer's state, even if the original credit/debit events are long gone. PaySetu uses compaction on the account-balance topic and time-based retention on the transaction-events topic. The two retention modes serve different needs: compacted topics are state, time-retained topics are history. A topic can do one or the other but generally not both. Choosing wrong forces an expensive migration.
Why MealRush moved off SQS and what it actually cost
MealRush (food delivery) ran on SQS for two years for the order-dispatch path. Each new order landed on SQS, a fleet of dispatch workers picked it up, assigned it to a delivery rider, and acked. Then product asked: "show me, in real time, all orders flowing through every city, for the operations dashboard". With SQS, the answer was hard — once dispatch acked a message, it was gone. The team's first attempt was an SNS topic with multiple SQS subscribers — one for dispatch, one for the dashboard, one for analytics. By month three the SNS-fanout topology had eight queues, six dead-letter queues, and a custom replay tool that sucked from S3. The migration to Kafka took four engineers eleven weeks and shipped a 60% reduction in broker complexity (single Kafka topic replaced eight SNS-SQS pairs). The lesson the engineering blog drew was not "Kafka is better than SQS" — it was that the workload had been a stream from day one, and SQS made it feel like a queue until the cost of the disguise compounded.
When the right answer is "use both"
The cleanest production architectures often use both primitives in the same path. Events flow into a stream — Kafka or Pulsar — for the durable, fan-outable, replayable log of truth. A specific consumer that needs work-distribution semantics — say, a refund processor that wants ack/redeliver/dead-letter behaviour — pulls from the stream into a queue, runs its workers off the queue, and uses the queue's per-message ack semantics for failure handling. The stream remains the source of truth; the queue is a transient scratchpad for the work-distribution pattern. PaySetu's refund pipeline is built this way: the transaction-events Kafka topic feeds a small Lambda-fronted SQS queue that handles refund orchestration. The team gets fanout from the stream (analytics, fraud, ML all consume the same Kafka topic) and gets clean work-distribution from the queue (refund-worker fleet auto-scales, dead-letter handling is built-in). Picking both is sometimes the right answer; picking neither and reinventing one on top of the other is almost never the right answer.
Where this leads next
The next chapters in Part 15 dive into specific stream brokers — Kafka, Pulsar, Redpanda, NATS — and then into the operational mechanics that fall out of the stream primitive: partition rebalancing, consumer-group coordination, exactly-once semantics, log compaction, and tiered storage.
If you came here from Part 14, the bridge is the /wiki/wall-atomic-broadcast-needs-ordering chapter — the stream broker is, structurally, an atomic-broadcast log exposed as a product. The replication you cared about for transactions is the same primitive you are now consuming as a streaming substrate.
The deeper rabbit hole is change-data-capture: streams that are not produced by application code at all, but by tailing a database's write-ahead log. CDC turns every database into a stream source and is the foundation of every modern data pipeline. The chapter that picks that up is /wiki/cdc-debezium-and-the-wal.
References
- Kreps, The Log: What every software engineer should know about real-time data's unifying abstraction, LinkedIn Engineering 2013 — the foundational essay.
- Kreps, Narkhede & Rao, Kafka: A distributed messaging system for log processing, NetDB 2011 — the original Kafka paper.
- RabbitMQ Streams documentation — the canonical reference for queues-vs-streams in a single broker.
- AWS, Amazon SQS vs Amazon Kinesis — the cloud-vendor framing of the same split.
- Confluent, Kafka consumer groups vs traditional message queues — practical guide to the operational difference.
- Pulsar documentation: subscription types — the four subscription modes (exclusive, shared, failover, key-shared) and how they map to queue vs stream semantics.
- /wiki/wall-atomic-broadcast-needs-ordering — the previous chapter; why the stream broker is structurally atomic broadcast.
- /wiki/eventual-consistency — the related ordering trade-off that streams sit alongside.