Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.
Lightweight streaming: NATS, Redpanda
MealRush's logistics team has a Friday-night habit. At 8:30pm, courier-location pings start arriving at 180,000 events per second, and Aditi — the on-call SRE — watches the four-broker Kafka cluster's CPU climb from 35% to 78%. The data is small, mostly throwaway, retained for 90 minutes. It does not need a cross-AZ Raft log, MirrorMaker, or a 4 GB JVM heap per broker. It needs a fast pipe. The post-mortem question after one too many late-night under-replicated-partitions alerts: do we actually need Kafka here, or did we add Kafka because Kafka is what the previous team added? This chapter is about two answers to that question — NATS, a system designed from the ground up to be small and fast, and Redpanda, a system that keeps Kafka's wire protocol but throws away most of Kafka's mass.
Lightweight streaming is the half of Part 15 that the Kafka chapters skipped. The mental model is the same log abstraction, but with sharply different operational footprints — fewer moving parts, no JVM, no separate metadata service, faster to start, easier to embed. The trade-offs are real: features Kafka has built up over a decade are still being filled in, and at the very largest scales the engineering investment Kafka has absorbed shows. But for the long tail of "I need a bus, not a data lake", these systems exist and they are good.
NATS and Redpanda are two answers to "Kafka is too much". NATS is a 20-MB Go binary offering subject-based pub-sub by default and a separate Raft-replicated log called JetStream when durability is needed; it boots in 200 ms and routes a million in-memory messages per second on a laptop. Redpanda is a C++ rewrite of the Kafka broker — same wire protocol, same client libraries, no JVM, no ZooKeeper, a per-shard thread-per-core architecture — that aims to give you Kafka's API at a fraction of Kafka's operational mass. Pick NATS when the workload is fan-out, request-reply, or short-retention buffering. Pick Redpanda when you have an existing Kafka ecosystem and want to drop the broker mass without rewriting clients.
What "lightweight" means in practice
Two numbers explain why people reach past Kafka. First, Kafka's broker process holds a JVM heap (typically 6–10 GB) plus off-heap page cache; a three-broker cluster uses about 30 GB of RAM before any data flows. Second, until KRaft stabilised, Kafka required a separate ZooKeeper ensemble of three or five nodes. For a 50,000-message-per-second internal-traffic workload at MealRush, that means eight nodes and 40 GB of RAM to run a service whose actual hot working set is 2 GB.
NATS and Redpanda attack this from opposite ends. NATS rejects the Kafka feature set: it is fundamentally a subject-based message bus, with persistence as an opt-in module (JetStream). Its core dependency is Go's runtime — the binary is a single static file, and the broker's RAM at idle is around 12 MB. Redpanda accepts Kafka's feature set but rejects Kafka's implementation: it implements the Kafka wire protocol byte-for-byte in C++ on top of Seastar's thread-per-core, share-nothing model, with metadata in an internal Raft group. A three-node Redpanda cluster on c6i.xlarge does what a six-node Kafka cluster (three brokers + three ZooKeeper) does on the same hardware, with about 40% lower p99 produce latency at typical loads.
Why the weight matters: at small workloads (under 100k messages/sec, retention measured in hours), Kafka's overhead does not amortise. The JVM's GC tuning, the ZooKeeper quorum, the partition rebalancing — all of these are costs you pay continuously and benefits you only need at the scales where the cluster runs hot. NATS at MealRush's logistics scale would consume roughly 1/30th the RAM at comparable throughput, and Redpanda would consume roughly 1/3rd while keeping the existing Kafka clients. The right choice is not the one with the most features; it is the one whose operational mass is proportional to the value the workload generates.
NATS — subjects, queues, and JetStream
The mental model for NATS is closer to a routing fabric than a log. A producer publishes to a subject like pings.couriers.bengaluru.east; a subscriber subscribes to a subject pattern like pings.couriers.bengaluru.* or pings.couriers.> (the > matches multiple tokens). The broker (called nats-server) routes messages from publishers to all matching subscribers in microseconds — there is no log, no offset, no consumer group. If no subscriber is listening, the message is dropped. If a subscriber lags, the broker disconnects it (or buffers, configurably). This is at-most-once by default.
For workloads that need durability, NATS adds JetStream — a separate subsystem in the same binary that turns matching subjects into a stream with disk-backed retention, and gives consumers durable cursors plus per-message acknowledgement. The stream is replicated across nodes via Raft (1, 3, or 5 replicas). Once you flip JetStream on, the surface looks much closer to Kafka: durable storage, ordered consumers, replay from a position. The shape difference: in Kafka, the topic is the persistence unit; in JetStream, the stream is a wrapper that captures one or more subject patterns into a single log, so you can route logically (by subject) and persist physically (by stream).
subject: pings.couriers.bengaluru.east ──┐
subject: pings.couriers.bengaluru.west ──┼──> stream: courier-pings-bengaluru
subject: pings.couriers.bengaluru.south ──┘ (3 replicas, 6h retention)
A single NATS deployment routinely runs core pub-sub for ephemeral fan-out (telemetry, presence, request-reply for microservices) and JetStream for the few subjects that need durability (orders, payments, audit). The same wire protocol; the difference is whether messages are matched-and-forwarded or matched-and-stored.
Why subject hierarchy is the load-bearing primitive: a subject like orders.merchant.MR-2018.upi.created carries five orthogonal dimensions (event-class, role, merchant-id, payment-method, lifecycle). A subscriber that wants "all UPI events for merchant MR-2018" subscribes to orders.merchant.MR-2018.upi.> and the broker filters at routing time. Compare with Kafka, where the topic is one dimension — you either partition by merchant and filter on payment-method downstream, or you create one topic per (merchant × method) cross product and explode topic count. NATS pushes filtering into the broker by design, which is why it pays off for high-cardinality fan-out workloads.
Redpanda — same protocol, different engine
Redpanda's pitch is simpler: keep the Kafka API, throw out the implementation. The broker is written in C++ on top of Seastar — a thread-per-core, share-nothing async framework originally built for ScyllaDB. Each CPU core runs one shard, owns its share of partitions, and never communicates with other shards via locks; cross-shard traffic goes through explicit message passing. There is no JVM, no garbage collector, no page cache (Redpanda manages its own). Replication uses internal Raft groups per partition; metadata uses one cluster-wide Raft group. ZooKeeper is gone.
For the user, the change is invisible at the wire level. Existing producer / consumer libraries connect to Redpanda the same way they connect to Kafka. Tools like kafkactl, Schema Registry, Kafka Connect, and kcat work unmodified. The Kafka Improvement Proposals (KIPs) Redpanda implements are tracked publicly; when a KIP lands in Kafka, Redpanda often implements it within months.
Why thread-per-core matters at the tail: in a JVM-based broker, every produce request can be parked behind a stop-the-world GC pause. The pause is rare, but it is correlated across all in-flight requests on that broker, so the p99.9 of produce latency follows the GC schedule. In Seastar's model, each shard processes its requests on a single OS thread with no shared heap, no GC, and no kernel-level context switch. KapitalKite's order-routing service moved from Kafka to Redpanda and observed p99.9 produce latency drop from 380 ms to 41 ms on the same hardware, because the long-tail GC pauses simply did not exist anymore.
A produce-and-consume run on both engines
The point of "Kafka-compatible" is that the same client code works. Here is a runnable Python script that produces 200 messages to a single-broker NATS-JetStream cluster and to a single-node Redpanda cluster, measures end-to-end latency, and consumes them back. You can run both halves on a laptop with two terminals.
# lightweight_streaming.py — compare publish→consume latency on NATS JetStream
# and Redpanda from a single Python process.
#
# pip install nats-py confluent-kafka
#
# Setup (separate terminals):
# docker run -d --name nats -p 4222:4222 nats:2.10 -js
# docker run -d --name redpanda -p 9092:9092 \
# redpandadata/redpanda:v23.3.5 redpanda start \
# --overprovisioned --smp 1 --memory 1G --reserve-memory 0M \
# --node-id 0 --check=false \
# --kafka-addr PLAINTEXT://0.0.0.0:9092 \
# --advertise-kafka-addr PLAINTEXT://localhost:9092
import asyncio, statistics, time
import nats
from confluent_kafka import Producer, Consumer
N = 200
PAYLOAD = b'{"courier":"CR-417","lat":12.97,"lng":77.59,"ts":%d}'
async def nats_run():
nc = await nats.connect("nats://localhost:4222")
js = nc.jetstream()
await js.add_stream(name="pings", subjects=["pings.>"])
sub = await js.subscribe("pings.>", durable="bengaluru-sub",
ordered_consumer=False, manual_ack=True)
latencies = []
for i in range(N):
t0 = time.perf_counter_ns()
await js.publish(f"pings.couriers.{i}", PAYLOAD % t0)
msg = await sub.next_msg(timeout=2.0)
t1 = time.perf_counter_ns()
await msg.ack()
latencies.append((t1 - t0) / 1e6) # ms
await nc.close()
return latencies
def redpanda_run():
p = Producer({"bootstrap.servers": "localhost:9092",
"linger.ms": 0, "acks": "1"})
c = Consumer({"bootstrap.servers": "localhost:9092",
"group.id": "bengaluru-sub",
"auto.offset.reset": "earliest",
"enable.auto.commit": False})
c.subscribe(["pings"])
latencies = []
for i in range(N):
t0 = time.perf_counter_ns()
p.produce("pings", PAYLOAD % t0); p.flush()
msg = None
while msg is None or msg.error():
msg = c.poll(2.0)
t1 = time.perf_counter_ns()
latencies.append((t1 - t0) / 1e6)
c.close()
return latencies
def stats(label, xs):
xs = sorted(xs)
p50 = xs[len(xs)//2]
p99 = xs[int(len(xs) * 0.99)]
print(f"{label:18} p50={p50:.2f}ms p99={p99:.2f}ms "
f"mean={statistics.mean(xs):.2f}ms n={len(xs)}")
if __name__ == "__main__":
nl = asyncio.run(nats_run())
rl = redpanda_run()
stats("NATS JetStream", nl)
stats("Redpanda", rl)
Sample run on a 2024 laptop with both containers running locally:
NATS JetStream p50=0.78ms p99=2.41ms mean=0.96ms n=200
Redpanda p50=2.14ms p99=6.83ms mean=2.42ms n=200
Walkthrough of the load-bearing lines:
await js.add_stream(name="pings", subjects=["pings.>"])— JetStream streams are created explicitly and bind one or more subject patterns. The>is the multi-token wildcard. Once the stream exists, everypublishmatching a bound subject is captured durably. There is no equivalent in Kafka — every Kafka topic is a stream.durable="bengaluru-sub"— JetStream consumers can be ephemeral (server forgets the cursor when the connection drops) or durable (cursor persists in the stream's metadata). Naming the consumer makes it durable. This is the closest analogue to Kafka's consumer group, but at the consumer level rather than per-topic.acks="1"on Redpanda — Redpanda speaks the Kafka producer protocol so the sameackssemantics apply:acks=1means the leader-shard's ack is enough;acks=allwaits for the partition's full Raft quorum. Redpanda's defaults are conservative (acks=all) — explicitacks=1makes the latency comparison fairer at the protocol level.p.flush()—confluent-kafka(librdkafka) buffers asynchronously by default. Callingflushafter each produce makes this a synchronous round-trip for the measurement. In production you would never do this; you would let the buffer fill, batch, and amortise the network cost.time.perf_counter_ns— both runs measure the same span: time-to-publish plus time-to-receive on the consumer side. This is end-to-end publish-to-deliver latency, not just produce-ack. NATS's lower numbers come partly from in-process subject routing and partly from the smaller wire protocol; Redpanda's are dominated by the Kafka protocol's per-batch overhead.
Why these numbers are not "NATS is 3x faster than Redpanda forever": this is a single-message, single-producer, single-consumer microbenchmark with no batching. At realistic batch sizes (Redpanda producer with linger.ms=10, batches of 100 messages), Redpanda's amortised per-message produce cost is comparable to NATS, and Redpanda's throughput ceiling is higher because its log-replication is more efficient under heavy parallel load. The microbenchmark exposes the fixed overhead of each protocol; the steady-state throughput numbers tell a different story. Always benchmark the workload you actually run.
Common confusions
- "NATS is just a faster Kafka." No — NATS core is at-most-once pub-sub with no log. JetStream adds the log, and even there the data model is subject-routed streams, not topic-partition offsets. The mental models differ enough that porting a Kafka design directly to NATS often misses the point of NATS's subject hierarchy.
- "Redpanda has no Raft because it's single-binary." Redpanda is full of Raft — every partition is its own Raft group, and there is also a cluster-wide controller Raft group for metadata. The single-binary property is about packaging, not about consensus. The controller Raft replaces ZooKeeper / KRaft externally; per-partition Raft replaces ISR.
- "NATS subjects are like Kafka topic names." Subjects support hierarchical wildcards (
*for a single token,>for multi-token), so a subscriber toorders.MR-2018.>receives all events whose subject starts with that prefix. Kafka topic names are flat strings; you cannot subscribe to "all topics matching a prefix" in Kafka without an external listing API. - "Redpanda is exactly Kafka, just with no JVM." The wire protocol matches, but operational details differ. There is no
log.dirsper disk; each shard owns its own data directory. Topic auto-creation defaults differ. Some KIPs land later than in Apache Kafka. For the bulk of producer-consumer code, this is invisible; for tooling that reads broker internals, it sometimes is not. - "JetStream replaces Kafka." JetStream adds durable streaming to NATS, but its tooling ecosystem is younger. There is no equivalent of Kafka Connect's hundreds of source / sink connectors out of the box, and Schema Registry integrations are still maturing. JetStream is a good answer for "I have NATS already and need durability for some subjects"; it is a less obvious answer for "I want to migrate off Kafka tomorrow".
- "Lightweight means less reliable." No — both NATS JetStream and Redpanda use Raft for replicated durability and have run production workloads of millions of messages per second. The lightness is in operational footprint (process count, memory, dependency chain), not in the durability story. A 3-node JetStream cluster with R=3 streams and a 3-node Redpanda cluster with replication-factor 3 both survive any single-node loss without data loss.
Going deeper
NATS leaf nodes and the global mesh
NATS's deployment topology has a feature Kafka does not: leaf nodes. A leaf is a small NATS server that connects to a parent cluster, brings up a local subject namespace, and selectively bridges subjects bidirectionally. The point: edge devices, customer premises, or a developer's laptop can each run a NATS server, talk to local subscribers in microseconds, and stream selected subjects to the central cluster over a single TCP connection. AutoGo uses leaf nodes in driver-side mobile clients (via the embedded NATS-as-a-library) so that GPS pings publish locally first, get filtered, and only the per-second aggregate goes upstream — saving 80% of cellular bytes versus publishing every ping to a central broker. The architectural primitive (subject-as-routing) makes this composition natural.
Redpanda's tiered storage and shadow indexing
Redpanda implements tiered storage similarly to Kafka's KIP-405: closed log segments are uploaded to S3 / GCS, and the broker keeps only an index. Reads on offloaded data fetch the segment over HTTP, decompress, and serve. Redpanda's name for the index is shadow indexing, and the implementation is per-partition (each Raft group manages its own offload schedule). The performance trade-off mirrors Pulsar's tiered storage: hot reads are sub-millisecond, cold-tier reads are 50–200 ms. The win is operational — a Redpanda cluster with tiered storage on can run with 100 GB of local SSD per broker and still retain months of data, which directly attacks Kafka's "your storage scales with your retention" tax.
NATS's ordered consumer and KV store
JetStream has two features worth highlighting beyond the basic stream-and-consumer model. The ordered consumer is an ephemeral consumer that auto-resubscribes from the last received message id on connection loss, with no durable cursor needed — useful for read-only fan-out where redelivery on disconnect is acceptable but a durable cursor is not worth the metadata cost. The JetStream KV store is built on top of streams: each key is a subject under a stream's prefix, each put is a publish, the latest message wins. Reads use a special consumer that materialises the current value. PaySetu uses JetStream KV for distributed feature flags — 200 keys, replicated to three nodes, watch-on-update streaming changes to every service in 8 ms p99. It is not a replacement for etcd or Consul, but for the "config that changes hourly and ten services need to react" use case it is one fewer system to run.
When neither is the answer
Both NATS and Redpanda have boundaries. For stream processing with stateful joins / windowing — Flink-style continuous aggregations, ksqlDB-equivalent SQL queries — Kafka's ecosystem is meaningfully richer; both NATS and Redpanda either rely on external processors or have less mature alternatives. For deeply integrated change-data-capture from databases (Debezium, Kafka Connect's hundreds of certified connectors), Kafka is the path of least resistance. For multi-region active-active replication with deterministic conflict resolution, neither system has a story as polished as Confluent Cluster Linking — JetStream's mirror streams and Redpanda's cluster-linking-equivalent are both maturing. The honest framing: lightweight wins at the operational layer; heavyweight still wins at the ecosystem layer. The bet you make is whether the workload's value comes from broker-throughput / cluster-mass or from the connectors and stream-processing toolchain on top.
Reproduce on your laptop
# NATS with JetStream enabled
docker run -d --name nats -p 4222:4222 nats:2.10 -js
# Redpanda single-node
docker run -d --name redpanda -p 9092:9092 \
redpandadata/redpanda:v23.3.5 redpanda start \
--overprovisioned --smp 1 --memory 1G --reserve-memory 0M \
--node-id 0 --check=false \
--kafka-addr PLAINTEXT://0.0.0.0:9092 \
--advertise-kafka-addr PLAINTEXT://localhost:9092
pip install nats-py confluent-kafka
python3 lightweight_streaming.py
# Inspect:
docker exec nats nats stream ls -s nats://localhost:4222
docker exec redpanda rpk topic list
Where this leads next
- /wiki/kafka-as-a-distributed-log — the previous chapter, for the heavyweight reference implementation.
- /wiki/pulsars-architecture — Pulsar attacks the same operational pain points with a different layering choice (broker / BookKeeper split).
- /wiki/at-least-once-idempotency-in-practice — both NATS JetStream and Redpanda give you at-least-once; the consumer-side idempotency story is the same recipe.
The next chapter in this part covers streaming SQL and stateful processing — the layer above the broker where the choice of broker becomes less load-bearing because the processor maintains its own state, and the broker's job collapses to "deliver this partition to this worker, in order, with checkpoints".
References
- Derek Collison, NATS — A High-Performance Messaging System — core docs, includes the subject-hierarchy and JetStream design pages.
- StreamNative, JetStream Concepts — the stream / consumer / KV model in NATS's own words.
- Alexander Gallego, The Redpanda thesis — design rationale for thread-per-core streaming.
- Avi Kivity et al., Seastar: a high-performance server-side framework — the C++ async framework underlying Redpanda.
- Diego Ongaro, John Ousterhout, In Search of an Understandable Consensus Algorithm — the Raft paper; both Redpanda and JetStream use Raft for replication.
- Apache Kafka contributors, KIP-405: Kafka Tiered Storage — referenced for the cold-tier comparison.
- Jay Kreps, The Log: What every software engineer should know about real-time data's unifying abstraction — the foundational essay; lightweight engines still inherit its mental model.
- /wiki/the-append-only-log-simplest-store — cross-curriculum primer on why a log is the right primitive at all.