Kafka vs Pulsar vs Kinesis vs Redpanda
A Razorpay platform team is choosing the message bus for the next five years of payments traffic. The architect has two whiteboards full of arrows: a 50,000 partition Kafka cluster on i4i instances, a Pulsar deployment with BookKeeper as the storage layer, an AWS Kinesis Data Streams option that means "no brokers to run", and a Redpanda binary that promises drop-in Kafka API compatibility on a quarter of the hardware. The same producer code talks to all four. The same consumer code reads from all four. So the question — the one that decides whether on-call gets paged at 2 a.m. or sleeps through — isn't "which API". It's "what is each system doing under the hood with my bytes?"
Kafka, Pulsar, Kinesis, and Redpanda all expose partitions, producers, and consumers — and they all disagree on the layer below. Kafka couples broker, storage, and compute on one node; Pulsar separates the broker (stateless) from BookKeeper (storage); Kinesis hides everything behind a managed shard model with hard per-shard limits; Redpanda re-implements the Kafka protocol on a thread-per-core C++ engine with no JVM and no page cache. The choice is mostly a cost-vs-control trade in the operational layer, not the API.
Why the API isn't the difference
Every one of these systems exposes the same surface: a topic is split into partitions (or shards), each partition is an ordered append-only sequence of records, producers append, consumers read by offset, and consumer groups coordinate to assign partitions across instances. Kafka's protocol won — Pulsar ships a Kafka-on-Pulsar gateway, Redpanda implements the wire protocol byte-for-byte, and Kinesis exposes a slightly-different-but-equivalent shard API. So when a Zerodha engineer writes a KafkaProducer in their Java app, the same five lines of code (with one config change) work against Apache Kafka, Confluent Cloud, Redpanda, Pulsar's KoP gateway, and a Kinesis-via-MSK-compat layer. The API is no longer the differentiator.
What changes underneath is the storage architecture. Specifically: where do the bytes live, who is responsible for replicating them, and how is "the broker that handles this partition's writes" coupled to "the disk holding the last 7 days of records"?
Why the storage layer is the differentiator: append-throughput, replication policy, and "what happens when one broker dies" are all properties of the storage layer, not the API. A producer sending 1 million events/sec doesn't care whether the bytes land in Kafka segment files or BookKeeper ledger fragments — but the operator who has to add capacity, rebalance partitions, or recover from a single-node failure cares enormously. The storage layer is what they spend their time tuning.
Kafka: the baseline you've already learned
You already know Kafka from the previous chapters in Build 7 — partitions, segments, ISR, retention, compaction, tiered storage. The shape is: each broker is responsible for some partitions, the leader for that partition handles writes, followers replicate. Local disk holds segments. Page cache makes reads fast. Replication factor 3 means each partition's records exist on three different brokers' disks.
This architecture has two consequences that matter for the comparison. First, adding broker capacity requires moving data. If the cluster is at capacity and you add three new brokers, partitions don't automatically move — you have to run a partition reassignment (kafka-reassign-partitions.sh), which physically copies segment files across the network until the new brokers hold their share. For a cluster with 200 TB of data, that rebalance can take days, and during the rebalance the network is saturated with replication traffic that competes with live producer traffic. Second, broker, compute, and storage scale together. If you need more storage you add brokers, even if you don't need more CPU. If you need more CPU you also add disks you don't need.
Kafka's tiered storage (KIP-405, covered in the previous chapter) softens this — the cold tail moves to S3 — but the hot path is still broker-coupled. For a Razorpay cluster with 7-day local retention, the local disk pressure is bounded by 7d × throughput × replication_factor, and adding throughput still requires more brokers. The architectural decision Kafka commits you to is "compute and storage are colocated and scale together".
KRaft (KIP-500, Kafka 3.3+) replaced ZooKeeper for cluster metadata but didn't change this fundamental shape. The metadata layer is now an internal Raft quorum among a few "controller" brokers, but the data plane — partitions on brokers, replication, segments — is unchanged. The Kafka of 2026 has dropped its operational dependency on ZooKeeper but kept its operational dependency on co-located storage.
Pulsar: separate the broker from the disk
Apache Pulsar's central design choice is to decouple the broker from the storage layer. A Pulsar broker is stateless — it owns no partitions, holds no data on disk, can be killed and replaced in seconds. Underneath the brokers sits a separate cluster of "bookies" running Apache BookKeeper, which is the actual durable storage. A topic's records are written by a broker into BookKeeper "ledgers" (think: append-only logs), which BookKeeper replicates across N bookies (the qw write quorum, typically 3 of 5).
When a Pulsar broker dies, the topics it was serving are reassigned to surviving brokers within seconds — no data movement, no rebalance, because the data was never on the broker. The new broker just opens a connection to the same BookKeeper ledgers and resumes serving. This is operationally distinct from Kafka, where losing a broker triggers replica election and ISR adjustments and partition migration.
The trade-off is two services to operate instead of one. A Pulsar deployment needs a healthy broker tier, a healthy BookKeeper tier, and a metadata layer (typically ZooKeeper, recently moving toward etcd). Each tier scales independently — if you need more storage, add bookies; if you need more producer connections, add brokers. For Flipkart's catalogue events at peak Big Billion Day load, the broker tier saw 10× the steady-state QPS while the storage tier was unchanged (records sizes hadn't grown, just connection count had); they scaled brokers from 12 to 48 in 30 minutes without touching BookKeeper. The same scenario in Kafka would have required adding brokers and waiting hours for partition rebalance.
# pulsar_vs_kafka_topology.py — show what each system asks you to manage
SYSTEMS = {
"kafka": {
"broker_count": 12, # holds partitions + serves clients
"storage_count": 0, # storage is on the brokers
"metadata": "kraft", # internal Raft quorum
"scaling_unit": "broker", # one knob: brokers
"data_movement_on_scaling": "yes", # partition rebalance
},
"pulsar": {
"broker_count": 8, # stateless routing tier
"storage_count": 9, # bookies (3 of 5 quorum + headroom)
"metadata": "zookeeper", # external (or etcd in newer deploys)
"scaling_unit": "broker_or_bookie", # two knobs
"data_movement_on_scaling": "no", # bookies join the read/write quorum
},
"kinesis": {
"broker_count": "n/a", # AWS managed
"storage_count": "n/a",
"metadata": "n/a",
"scaling_unit": "shard", # one knob: shard count
"data_movement_on_scaling": "no", # AWS handles re-sharding
},
"redpanda": {
"broker_count": 6, # holds partitions + serves clients
"storage_count": 0, # like Kafka, colocated
"metadata": "raft_internal", # always built-in, no ZK dependency
"scaling_unit": "broker",
"data_movement_on_scaling": "yes", # same as Kafka
},
}
def operability_score(s):
# Crude proxy: fewer moving parts = lower ops cost
parts = 1
if s["broker_count"] != "n/a":
parts += 1
if s["storage_count"] not in ("n/a", 0):
parts += 1 # separate storage tier
if s["metadata"] not in ("n/a", "raft_internal", "kraft"):
parts += 1 # external metadata service
return parts
for name, cfg in SYSTEMS.items():
print(f"{name:10} parts_to_operate={operability_score(cfg)} "
f"scaling={cfg['scaling_unit']:18} "
f"data_moves_on_scale={cfg['data_movement_on_scaling']}")
kafka parts_to_operate=2 scaling=broker data_moves_on_scale=yes
pulsar parts_to_operate=4 scaling=broker_or_bookie data_moves_on_scale=no
kinesis parts_to_operate=1 scaling=shard data_moves_on_scale=no
redpanda parts_to_operate=2 scaling=broker data_moves_on_scale=yes
The walkthrough of the lines that decide everything:
storage_count— for Kafka and Redpanda this is zero because storage is on the brokers themselves. For Pulsar it's a separate tier; for Kinesis it's hidden behind the API.scaling_unit— Pulsar'sbroker_or_bookieis the only entry where you can scale the two independently. The others force a coupled decision.data_movement_on_scaling— Kafka and Redpanda both require partition rebalance when adding capacity. Why this matters in production: a partition reassignment on a 200 TB cluster takes hours and saturates the network during the move. If your cluster is sized for steady-state and you need to absorb a 5× traffic surge, "add brokers" doesn't help quickly — you have to wait out the rebalance. Pulsar and Kinesis avoid this by keeping the data tier separate from the routing tier, so adding routing capacity is fast.operability_score— a crude finger-in-the-air count of services to monitor, alert on, and patch. Kinesis is "1" because AWS hides everything; Pulsar is "4" because broker, bookie, ZK, and metadata are all separate concerns.
The output isn't a verdict — it's the trade space. Pulsar's parts_to_operate=4 is the cost of its data_moves_on_scale=no benefit. Kinesis's parts_to_operate=1 is what you buy with per-shard cost.
Kinesis: the managed shard model
Amazon Kinesis Data Streams is the message log re-imagined as a managed service. You don't run brokers; you don't tune retention; you don't think about ISR. You provision shards (Kinesis's word for partitions), each with hard caps: 1 MB/sec or 1000 records/sec write, 2 MB/sec read. You pay 0.015 per shard-hour (11/month per shard, baseline) plus per-PUT request and per-GB tiered storage if you enable extended retention. A 100-shard stream costs ~₹92,000/month at base, plus traffic charges.
The shard model is a strict capacity contract — exceed 1 MB/sec on a shard and Kinesis returns ProvisionedThroughputExceededException to the producer. This is unlike Kafka, where a partition has no hard cap and the overflow shows up as latency, lag, and broker pressure rather than an explicit error. Kinesis's model is therefore predictable in cost and capacity, less forgiving in burst behaviour. For Swiggy's order-event stream — bursty during the 7–10 p.m. dinner peak — the team explicitly over-provisions Kinesis shards (300 shards even though steady-state needs 80) to absorb the peak; the overhead is direct ₹ on the bill.
The flip side is that there are no servers to patch, no ZK ensemble to babysit, no rolling restart procedures, no kafka-reassign-partitions.sh runbooks. AWS handles replication (3 AZs, automatic), retention (24 hours default, up to 365 days with extended retention), and cross-AZ failover. The on-call burden for Kinesis is a fraction of Kafka's — at the cost of vendor lock-in and a per-shard pricing model that gets expensive at scale.
The other architectural quirk: Kinesis's read pattern is fan-out per consumer. By default, all consumers of a shard share the 2 MB/sec read budget. With "Enhanced Fan-Out" (an extra ₹130/month per shard per consumer), each consumer gets a dedicated 2 MB/sec stream. This is unlike Kafka, where N consumer groups reading the same partition each get the full disk read bandwidth (since the disk is local and serves the page cache to every reader cheaply). For a topic with 10 downstream consumers, Kafka's broker uses one set of read syscalls; Kinesis charges you 10× the EFO fee. Why this design: Kinesis stores data in a managed multi-tenant fleet shared across customers, so giving each consumer a dedicated bandwidth tier is the only way to enforce tenancy and prevent one noisy consumer from starving another. Kafka, being self-hosted, doesn't have a tenancy problem within a single team's cluster — every reader gets the local disk's full IO budget.
For Aadhaar/UIDAI's authentication telemetry, where the volume is steady and known and the cost predictability is the main requirement, Kinesis is a defensible choice. For Razorpay's payment events with 5-year compliance retention and ~250 TB total, Kinesis's per-shard pricing makes the bill prohibitive — Kafka with tiered storage costs about a fifth of Kinesis at that scale.
Redpanda: the same architecture, written in C++
Redpanda's bet is different from Pulsar's. It accepts Kafka's "broker-coupled storage" architecture as correct — but argues that Kafka's implementation leaves performance on the table because of the JVM and the page cache. Redpanda re-implements the Kafka wire protocol in C++ on the Seastar framework with thread-per-core architecture: each CPU core is a single-threaded reactor handling its own subset of partitions, with no shared mutable state across cores and no synchronisation overhead. Disk IO bypasses the kernel page cache — Redpanda manages its own buffers and uses Direct Memory Access (DMA) to read and write segments.
The performance claim, validated in independent benchmarks, is roughly: same throughput on a quarter of the hardware, lower and more predictable p99 latency (no GC pauses, no page-cache eviction storms). For a Bengaluru fintech running their event bus on 12 m5.4xlarge Kafka brokers (~₹4.5 lakh/month), the equivalent Redpanda cluster fits on 4 m6id.2xlarge nodes (~₹1.1 lakh/month) at the same throughput. The savings compound at scale.
The architectural trade-offs:
- Kafka API compatible, byte-for-byte. A Java app written against
kafka-clientsworks against Redpanda with one config change. This is a contract, not a port — Redpanda runs the official Kafka compatibility test suite as part of CI. - No ZooKeeper, no KRaft. Redpanda has Raft built into every broker for both data replication and metadata. One binary, no external metadata dependency. This is operationally simpler than Kafka 3.x (which needs at least KRaft controllers).
- No JVM. No GC pauses, no
-Xmxtuning, no JMX metrics — instead, a Prometheus endpoint andrpkCLI. Memory is managed manually in C++; the operational mental model is closer to a database engine than a Kafka deployment. - Thread-per-core. Adding cores adds throughput linearly until you saturate disk or network. But it also means partition count must be ≥ core count for the cores to be utilised — a single-partition topic on a 16-core box uses 1/16th of the broker's CPU.
- Tiered storage built in (
Shadow Indexing). Same model as Kafka's KIP-405 but more mature in production; it shipped before Kafka's tiered storage went GA.
The catch is that Redpanda is a single-vendor project (Redpanda Data, formerly Vectorized) — no Apache foundation governance, no large open-source community. Production support requires their commercial offering. For organisations that prioritise vendor neutrality, Apache Kafka remains the safer choice; for organisations that prioritise per-broker efficiency and operational simplicity, Redpanda is meaningfully better than Kafka on the same hardware.
Picking among them
The decision tree, when you actually sit down to choose:
| Constraint | Default choice | Why |
|---|---|---|
| Already on AWS, < 100 TB total, predictable load | Kinesis | Lowest ops burden, predictable cost up to that scale |
| Self-hosted, large existing Kafka investment, want vendor neutrality | Kafka | Largest ecosystem, most operators know it |
| Self-hosted, want decoupled storage and broker scaling | Pulsar | Stateless brokers + BookKeeper is unique to Pulsar |
| Self-hosted, want the lowest hardware cost for Kafka API | Redpanda | C++ + thread-per-core gives ~4× hardware efficiency |
| Multi-tenant SaaS (per-customer isolation) | Pulsar | Native multi-tenancy with namespaces, isolation, quotas |
| Multi-region active-active | Kafka with MirrorMaker 2 or Confluent Cluster Linking | Most mature replication tooling |
| < ₹50,000/month budget, < 10k events/sec | Kinesis or single-broker Kafka | Below the threshold where ops complexity matters |
| > 1M events/sec sustained, cost-sensitive | Redpanda or Kafka with tiered storage | Both can hit this; Redpanda needs less metal |
The pattern across all four: the right answer depends on the cost layer that dominates at your scale. At < 10k events/sec, ops cost dominates and managed Kinesis wins. At 1M+ events/sec, hardware cost dominates and Redpanda or tiered Kafka wins. Pulsar's broker/storage decoupling pays off when broker scaling and storage scaling have different time profiles — bursty connection counts with stable bytes, or vice versa. There's no universally correct choice; there are scale-and-priority-dependent choices.
One pragmatic note for Indian-context teams: data residency for RBI-regulated workloads (payments, banking) often constrains cloud-managed options. Kinesis is ap-south-1-only for most teams; AWS MSK is the closer Kafka-managed option but with similar locked-in pricing. For Razorpay, PhonePe, Cred, and similar regulated platforms, self-hosted Kafka or Redpanda on EKS (with EBS for durability) tends to be the chosen path — the regulator wants direct visibility into where bytes are stored, and managed services obscure that.
Common confusions
- "Pulsar is just Kafka with extra steps." Pulsar's broker/BookKeeper split is a different architecture, not a different deployment. The broker is stateless — that property does not exist in Kafka and cannot be retrofitted. The trade-off is operating two services instead of one.
- "Kinesis is a Kafka clone." Kinesis predates Kafka's broad adoption and uses a shard model with hard per-shard rate limits. It's API-compatible with the Kinesis Producer/Consumer Libraries, not with Kafka clients (without a translation layer like Kafka-Kinesis Connector or AWS MSK).
- "Redpanda is faster because it's written in C++." The C++ implementation matters less than the architectural decisions: thread-per-core, no page cache (DMA), no JVM. A C++ Kafka clone that copied JVM-Kafka's design would not be meaningfully faster. The performance comes from the IO model, not the language.
- "BookKeeper is just an alternative Kafka segment store." BookKeeper's ledger model and quorum semantics are different from Kafka's segment+replication model. A BookKeeper write is acknowledged when
qaofqwbookies persist it, not when a leader writes and followers catch up. The failure modes are different — leader election vs ledger fencing. - "Tiered storage in Kafka and Redpanda are the same thing." Both move cold segments to S3, but Redpanda's Shadow Indexing landed in production years before Kafka's KIP-405 GA and supports more aggressive cold-cache strategies. They're conceptually equivalent; the maturity of the implementation differs.
- "Kinesis has unlimited throughput because it's managed." Each shard has a hard cap of 1 MB/sec write and 2 MB/sec read, and you pay per shard. Beyond the cap you get explicit
ProvisionedThroughputExceedederrors — there is no "spillover" mode. Capacity planning in Kinesis is more rigid than in Kafka.
Going deeper
BookKeeper ledgers vs Kafka segments
A BookKeeper ledger is an append-only sequence of entries belonging to one logical stream. Each entry is replicated synchronously to the write-quorum bookies (qw, typically 3 or 5) and acknowledged once qa of them (typically 2 or 3) confirm. When the topic owner (Pulsar broker) decides to "roll" the ledger — daily, on size threshold, or on broker reassignment — it closes the current ledger and opens a new one. The closed ledger is now immutable across all qw bookies. This is structurally similar to Kafka's segment files, but the replication granularity is the entry, not the segment. A bookie failure mid-segment is recovered by re-replicating only the entries it had, not the whole segment. This finer-grained recovery is one reason Pulsar promotes itself as having "faster failure recovery" — the recovery surface area is smaller.
Why thread-per-core matters for tail latency
In a JVM-based Kafka broker, a single producer write touches several threads: a network IO thread, a request handler thread, a log writer thread, possibly a replicator thread. Each handoff is a context switch and a possible queueing delay. In Redpanda's thread-per-core model, the same write stays on one core from the network socket through to the disk write. Why this collapses p99: queueing delay is the dominant component of p99 latency in a multi-threaded server. Each handoff between threads adds variance — a thread that's scheduled out at the wrong moment adds milliseconds of stall. Eliminating the handoffs by keeping everything on one core eliminates the queueing variance, and the p99 collapses toward the median. This is the same trick that DPDK and Seastar use in network-heavy workloads. Redpanda's published benchmarks show p99 latencies of 5–10 ms under load where Kafka shows 50–200 ms, and the gap is all queueing.
Multi-tenancy: where Pulsar wins
Pulsar's namespace model — tenant/namespace/topic — gives you native isolation with per-tenant quotas, throttling, and authentication. Kafka has ACLs and quotas but no native concept of "tenant"; multi-tenancy is bolted on via shared cluster + naming convention + management tooling. For a SaaS platform like a hypothetical Indian B2B-SaaS company onboarding 200 customers, each with their own event streams, Pulsar's namespace isolation maps directly to "one customer = one namespace" and the quota and access controls are first-class. Doing the same in Kafka requires per-customer prefix conventions and external tooling to enforce isolation, which is workable but not as clean.
What the Kafka community is doing about Redpanda's pressure
Apache Kafka 3.x has shipped KRaft (no ZK), tiered storage, and a steady stream of latency improvements in response to the Redpanda pressure. The Strimzi operator has simplified Kubernetes deployment. Apache Kafka with the WarpStream backend (re-architects Kafka to use S3 as primary storage, like Pulsar's design but on the Kafka API) is a 2024–2025 development that aims to bring Kafka to the same cost-on-S3 architecture Pulsar already had. The arms race is alive — by 2027 the operational gap between Kafka and Redpanda may be smaller than it is today, and the case for Redpanda will rest more on "single binary, no JVM" than on raw throughput.
Kinesis Firehose vs Kinesis Data Streams: a common confusion
Two AWS products share the "Kinesis" name but solve different problems. Kinesis Data Streams (the subject above) is the Kafka-equivalent message log: producers write, consumers read by sequence number, replay is possible, retention is up to 365 days. Kinesis Data Firehose is a managed delivery pipeline — producers write, Firehose batches the records and writes them to a destination (S3, Redshift, OpenSearch, an HTTP endpoint), and there is no consumer model and no replay. Firehose is conceptually closer to Kafka Connect with an S3 sink than to Kafka itself. Teams choosing between the two should ask: "do I need to read the stream from multiple consumers later?" If yes, Data Streams. If the only goal is "land these records in S3 with batching", Firehose is a quarter the cost of running Data Streams + a custom S3 sink. Mixing them up — using Firehose where Data Streams was needed — is one of the most common AWS data-engineering mistakes, because once you commit and start writing, switching costs are real (no replay means historical data on the wrong product is lost).
Why nobody just uses one of them everywhere
The honest answer to "which one wins" is "different teams within the same company often use different ones". A Razorpay platform might use Kafka for the core payments event bus (the SLO-bearing path), Kinesis for an analytics ingestion pipeline (where the AWS team's expertise dominates), and a small Redpanda cluster for an internal-tools team that doesn't want to share the production Kafka. The reason is that the cost equation differs by team — the platform team's bottleneck is hardware cost (so Kafka with tiered storage), the analytics team's bottleneck is people-time (so Kinesis), the internal-tools team's bottleneck is "don't break the production cluster with our experimental workload" (so Redpanda in a separate cluster). The shared API across all four makes this multi-system reality tractable: producers and consumers don't change. The result is that "which message log" is rarely a one-system answer at any organisation past 1000 engineers.
Where this leads next
The architectural comparison closes Build 7 — the message log is no longer one system, it's a class of systems with shared API and divergent storage layers. The next chapter, /wiki/zookeeper-vs-kraft-and-the-controllers-job, looks inside Kafka specifically to understand how the controller — the brain managing leader election, ISR transitions, and metadata — works in the post-ZK world. That's the operational substrate that makes any of these comparisons meaningful: until you understand what the controller does, you can't compare "controller-in-the-broker" (Redpanda, KRaft Kafka) versus "controller-in-a-separate-tier" (BookKeeper + ZK Pulsar).
Build 8 picks up the thread on top of any of these. /wiki/the-stream-processing-mental-model-events-state-time treats the message log as a substrate and asks "what state do you keep alongside the stream, and how do you handle late-arriving events". Most of Build 8's content is independent of which message log you chose — the stream-processing layer (Flink, Kafka Streams, Materialize) talks to whichever log via the Kafka protocol. The choice you made in Build 7 mostly affects cost and ops, not the abstractions Build 8 builds on top.
The mental model to take forward: the message log is a commodity API, the storage architecture under it is the trade-off space, and the choice depends on whose money is more expensive — operations team time, hardware cost, or vendor dependency. A senior data engineer at Razorpay or PhonePe can articulate which of those three dominates for their specific platform, and the right system follows.
References
- Apache Pulsar — Architecture overview — the canonical Pulsar architecture doc covering broker statelessness and BookKeeper.
- BookKeeper paper — DistributedLog: A High Performance Replicated Log Service — the academic background for BookKeeper's ledger model.
- Redpanda — Why we built Redpanda in C++ — the Vectorized engineering blog on thread-per-core and DMA design.
- AWS Kinesis Data Streams — Service Quotas and Limits — the per-shard limits that drive Kinesis cost models.
- Confluent — KIP-500: Replace ZooKeeper with KRaft — the design doc for Kafka's removal of ZooKeeper.
- WarpStream — A Stateless Kafka on S3 — the recent Kafka-API-on-S3 architecture, blurring the Pulsar/Kafka line.
- /wiki/replication-and-isr-how-kafka-stays-up — the Kafka-specific replication mechanism this chapter compares against Pulsar's BookKeeper quorum.
- /wiki/retention-compaction-tiered-storage — the previous chapter; tiered storage in Kafka, which Redpanda shipped earlier and Pulsar gets via offloaders.