Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.
Exactly-once and the semantics debate
PaySetu's reconciliation team noticed a pattern around 3 a.m. on the last Tuesday of every quarter — between forty and sixty merchants would each receive a duplicate ₹0 settlement entry that perfectly cancelled itself a millisecond later. The amounts were zero, the books always balanced, no merchant ever complained. It would have stayed buried in a Splunk dashboard forever, except a junior SRE named Asha pulled the thread and discovered that the payments worker fleet was crashing on a JVM full-GC pause every quarter-end (driven by report generation), Kafka was redelivering the in-flight transaction events on consumer restart, and the downstream ledger was writing both copies — once with the original amount, once with an inverted compensating entry triggered by a guard the team had built years ago "just in case". The entire pipeline was advertised as "exactly-once" in the architecture deck. It was not. It was at-least-once delivery, plus a duct-tape idempotency layer that mostly worked, plus a self-healing compensator that made the failure mode invisible. That description fits roughly 90% of the production "exactly-once" systems you will ever encounter.
The phrase exactly-once is one of the most semantically loaded two-word claims in distributed systems, and almost every team that uses it means something subtly different. This chapter is about what the phrase can and cannot mean, why the network layer alone cannot deliver it, what Kafka's transactional API actually buys you, and the techniques — idempotency, deduplication, transactional outbox — that real systems use to ship "as if exactly once" while remaining honest about the underlying delivery semantics.
Exactly-once delivery between two processes over an asynchronous network is provably impossible — the Two Generals problem and FLP impossibility put a fence around it. Exactly-once processing is achievable and is what production systems mean: at-least-once delivery plus a state-mutation layer that absorbs duplicates without changing observable state. The three load-bearing techniques are idempotent operations, dedup tables keyed by message ID, and transactional commits that bind message-offset advance to state mutation in one atomic step. Kafka's "exactly-once" feature implements the third — within a Kafka topology only.
Why exactly-once delivery is mathematically off the table
Two processes communicating over an unreliable network cannot agree on whether a single message was delivered, ever. The proof is two generals trying to coordinate an attack: general A sends a message to general B, but cannot tell whether B received it without an acknowledgement from B; B sends an ack but cannot tell whether A received the ack without an ack-of-ack; the recursion never terminates. This is not a quirky thought experiment — it is the floor on what any messaging protocol can guarantee. You can have at-most-once (send, never retry, accept loss), at-least-once (send, retry until ack, accept duplicates), or you can fudge it with timeouts and pretend, but the asymmetry is fundamental: the sender cannot know what the receiver did with the message.
Why this is not just an academic distinction: every retry-on-timeout client is at-least-once. The moment your HTTP client has a retry loop, your gRPC stub has a deadline-and-retry policy, or your Kafka producer has retries > 0, you are running an at-least-once system. There is no configuration flag that turns this off and still preserves liveness — if you set retries to zero, you become at-most-once, and you lose messages on the first network blip. So every production system has duplicates flowing through it; the only choice is whether the duplicates corrupt state or are absorbed silently. Teams that say "we use TCP, so duplicates can't happen" have confused TCP's in-stream deduplication (which works inside one connection) with end-to-end delivery semantics (which span connection retries, broker restarts, and consumer crashes). TCP saves you within a single transport hop; it has no opinion about what happens when your application-level retry kicks in after the connection drops.
The substitute that actually ships is exactly-once processing: duplicates may arrive, but the receiver's state is mutated as if exactly one delivery happened. This is what every production system advertised as "exactly-once" really delivers, and the substitution is fine — what users care about is whether the bank balance was debited once or twice, not how many bytes traversed the wire.
The three techniques that actually deliver exactly-once processing
Exactly-once processing is achieved by one (or a combination) of three techniques, and any system claiming the property will be using at least one of them. The techniques have different cost profiles and different failure modes, and most production systems pick the technique that matches the cheapest constraint of their workload.
The first technique is idempotent operations: design state mutations so that applying the same message twice produces the same final state as applying it once. SET balance = 5000 is idempotent; balance = balance - 100 is not. PaySetu's KYC service stores merchant verification status using UPSERT INTO kyc_status (merchant_id, status, verified_at) VALUES (...) ON CONFLICT (merchant_id) DO UPDATE — running the same message twice produces the same row. Idempotency is the cleanest technique because it requires no extra storage or coordination; the application's data model is the dedup mechanism. The catch is that it only works for state mutations that can be expressed idempotently. A pure increment — "add ₹100 to balance" — cannot. To make increments idempotent you must thread the message ID into the operation: "add ₹100 to balance, but only if message m-9237 has not been applied before", which sneaks technique two through the back door.
The second technique is deduplication tables: maintain a table of recently-seen message IDs and refuse to process a message whose ID is already in the table. The dedup table can live in the same database as the state, in a side cache (Redis, Memcached), or in the broker. The key engineering decision is the retention window — keep dedup entries for as long as duplicates can plausibly arrive. For a Kafka consumer with retries=Integer.MAX_VALUE and a max session timeout of 5 minutes, the duplicate window is bounded by the session timeout plus producer retry backoff — typically under 10 minutes. Holding 10 minutes of message IDs is cheap. For a system that retries on a 24-hour cron, the dedup table grows huge. KapitalKite's order-routing service uses a 24-hour dedup window keyed by client-generated order_id, stored in Redis with TTL, and the dedup table runs at around 2 GB on a quiet day and 14 GB during a market-open spike. The cost is real and proportional to the duplicate window times the message rate.
The third technique is transactional commits that bind state mutation and message-offset advance: the consumer reads message m at offset k, mutates state, and advances the consumer offset to k+1, all inside one atomic transaction. If the transaction commits, both the state change and the offset advance happen together; if it aborts, neither does. On the next read the consumer either sees offset k (and reprocesses, but the previous attempt's state change was rolled back) or k+1 (and the previous state change is already committed). There is no in-between state, so duplicates cannot occur — the message is processed exactly once from the consumer's point of view. This is what Kafka's "exactly-once semantics" feature implements: the producer's writes, the consumer's offset commits, and (in the case of stream processing) the next-stage producer's writes can all be wrapped in a Kafka transaction. The catch — and it is a big one — is that the transaction must include every system whose state is mutated. If your consumer writes to Kafka and Postgres, you need a transaction spanning both, which Kafka's API does not give you. You either use Kafka's transaction (state in Kafka only) or do a two-phase commit across systems (slow, fragile) or fall back to idempotency / dedup at the database layer.
Why the third technique stops at the broker boundary: a Kafka transaction is internal to a Kafka cluster. It atomically commits writes to Kafka topics and offset commits in the consumer-group offset topic. It cannot reach into your Postgres database, your S3 bucket, your downstream payment gateway, or your email service. So Kafka's "exactly-once" delivers exactly-once when the entire pipeline is Kafka-to-Kafka — read from topic A, transform, write to topic B, commit transaction. The moment a side effect leaves the broker — a database write, an HTTP call, a payment instruction — the guarantee evaporates and you are back to at-least-once delivery into that external system. To recover exactly-once for the side effect, you compose Kafka's transaction with idempotency or a dedup table on the receiving side, typically using the message-ID or offset as the dedup key. This is why production "exactly-once" pipelines are layered: Kafka EOS handles the broker-to-broker hop, idempotent or dedup-keyed downstream handlers handle the broker-to-database hop, and the whole composition delivers what users would call exactly-once. Calling the composite "Kafka exactly-once" without naming the downstream layer is the source of most production confusion.
A code walkthrough — at-least-once vs idempotent vs dedup
To make the difference concrete, here is the same payment-processing handler implemented three ways: naïvely (at-least-once, duplicates corrupt the balance), idempotently (the operation absorbs duplicates), and with a dedup table (any operation, but with extra storage). Each version processes the same five messages, including one duplicate.
# Three approaches to the same payment-processing problem.
# Stream of payment events, one duplicate (txn-3 appears twice).
events = [
{"id": "txn-1", "merchant": "m-A", "amount": 1000},
{"id": "txn-2", "merchant": "m-A", "amount": 500},
{"id": "txn-3", "merchant": "m-B", "amount": 2000},
{"id": "txn-3", "merchant": "m-B", "amount": 2000}, # duplicate
{"id": "txn-4", "merchant": "m-A", "amount": 300},
]
# ---------- 1. Naive at-least-once ----------
def naive():
bal = {}
for e in events:
bal[e["merchant"]] = bal.get(e["merchant"], 0) + e["amount"]
return bal
# ---------- 2. Idempotent (message ID threaded into the op) ----------
def idempotent():
bal, seen = {}, {} # seen: per-merchant set of applied IDs
for e in events:
s = seen.setdefault(e["merchant"], set())
if e["id"] in s: continue
s.add(e["id"])
bal[e["merchant"]] = bal.get(e["merchant"], 0) + e["amount"]
return bal
# ---------- 3. Dedup table (separate, supports any op) ----------
def with_dedup():
bal, dedup = {}, set() # dedup: global set of message IDs (TTL'd in prod)
for e in events:
if e["id"] in dedup: continue
dedup.add(e["id"])
bal[e["merchant"]] = bal.get(e["merchant"], 0) + e["amount"]
return bal
print("naive ", naive())
print("idempotent ", idempotent())
print("with_dedup ", with_dedup())
Realistic output:
naive {'m-A': 1800, 'm-B': 4000}
idempotent {'m-A': 1800, 'm-B': 2000}
with_dedup {'m-A': 1800, 'm-B': 2000}
Walk through the load-bearing pieces. The naïve version applies every event blindly, and the duplicate txn-3 adds ₹2000 to merchant B twice — the balance is wrong by ₹2000. This is what every production system looks like before somebody adds a dedup layer; the duplicates compound silently until reconciliation catches them. The idempotent version threads e["id"] into the operation by checking a per-merchant seen set before applying — the second txn-3 hits the if e["id"] in s guard and is skipped. Notice that "idempotent" here is not free; we built dedup state into the handler. The dedup-table version uses a global dedup set keyed by message ID, independent of the operation. This generalises — the handler can be any state mutation, not just balance increments — at the cost of a separate storage surface that needs garbage collection (TTL in production). The two safe versions produce identical correct output (m-B: 2000), and the unsafe naïve version produces wrong output (m-B: 4000); the duplicate did happen, but only the safe versions absorbed it without consequence.
In real systems, the dedup set lives in Redis or in a Postgres processed_messages table with (message_id, processed_at) and a periodic cleanup. The TTL is the duplicate window — typically the consumer's session timeout plus producer retry budget, with a safety multiplier. PaySetu uses 30 minutes for Kafka-driven dedup and 7 days for HTTP-webhook-driven dedup (because external retry windows are longer and less predictable).
What Kafka's "exactly-once" actually buys you, and where it stops
Kafka introduced transactional and idempotent producers in 0.11 (2017) and called the combination Exactly-Once Semantics, abbreviated EOS. The marketing was inevitable; the engineering reality is more nuanced. EOS gives two guarantees, both within a single Kafka cluster. First, the idempotent producer tags each producer with a producer_id and each message with a sequence number; the broker rejects duplicate sequence numbers from the same producer, so a producer that retries because the network ate the ack does not produce duplicates on the topic. Second, the transactional producer lets a single producer atomically write to multiple partitions and commit consumer offsets, all in one Kafka transaction; consumers running in read_committed mode see only committed messages.
Compose those two and you get end-to-end exactly-once for the canonical Kafka-to-Kafka stream-processing topology: read from topic A, transform, write to topic B, commit consumer-offset-on-A and message-on-B in one transaction. If the consumer crashes mid-transaction, the transaction aborts, the partial output on B is hidden from read_committed consumers, and the offset on A stays where it was — restart proceeds without duplication or loss. Kafka Streams uses this end-to-end; ksqlDB uses it; Flink can use it for Kafka sinks.
What EOS does not give you is exactly-once delivery into anything outside Kafka. The moment a Kafka consumer writes to Postgres, S3, an HTTP API, or a downstream queue, EOS stops applying — the external system has no notion of the Kafka transaction. To extend exactly-once across that boundary, the receiver must implement idempotency or dedup, and the producer must include a stable message ID. PaySetu's standard pattern threads the Kafka offset (or producer-id × sequence-number) into a dedup column on the destination row. The Kafka transaction handles broker-to-broker safety; the dedup column handles broker-to-database safety. Composed, the pipeline is exactly-once end-to-end. Take away either layer and duplicates leak.
Why this composition matters for system design and not just for architecture diagrams: the dedup column is typically a unique constraint on (message_id) in Postgres, with the application catching the unique-violation exception and treating it as "already processed". This pattern has the property that the database becomes the source of truth for what has been processed, which means even if the Kafka offset is somehow rewound (say, an operator accidentally resets the consumer group), the dedup column will silently skip everything that was already in the database, and the system self-heals to the correct state. The pattern survives operator error in a way that pure Kafka EOS does not — Kafka EOS is defensive against network and crash failures, but a manual offset reset blows past it. Threading the dedup constraint into the destination row gives you a defence in depth that costs one column and one unique index. This is the technique that turns "exactly-once on paper" into "exactly-once even when humans get involved".
Common confusions
- "Exactly-once delivery is the same as exactly-once processing." Delivery is a network property — bytes traversing a wire — and is impossible to guarantee. Processing is a state-mutation property — state changes happen as if exactly once — and is achievable through idempotency or dedup. Every "exactly-once" production system means the second one.
- "Kafka EOS gives me exactly-once for my whole pipeline." Only inside Kafka. As soon as your consumer writes to a database, calls an external API, or sends an email, you need an additional dedup or idempotency layer on the receiving side.
- "TCP guarantees no duplicates so I don't need dedup." TCP deduplicates inside a single connection. End-to-end retries — application-level retry, broker redelivery, consumer crash recovery — happen across TCP connections and have no connection-internal sequence numbers to coordinate on. TCP saves you from one class of duplicate; dedup saves you from the rest.
- "At-least-once + idempotent = exactly-once, so why the debate?" Because making operations idempotent is sometimes hard or impossible (e.g., "send an email" is not idempotent without a dedup table around it), and dedup tables have storage cost and TTL choices that affect correctness. The debate is about the right technique per workload, not about whether the goal is reachable.
- "Two Generals only applies to military scenarios; modern networks have TCP and ACKs." Two Generals applies to any asynchronous communication channel where messages can be lost — which is every network in the universe. The proof is independent of the transport protocol.
- "Idempotency means the operation has no side effects." No. Idempotent means the same input applied twice produces the same final state as applied once.
DELETE FROM users WHERE id = 5is idempotent — the row is gone after the first call, and the second call deletes nothing more. It still has a side effect.
Going deeper
How idempotency keys work in payment APIs
Stripe, Square, and most modern payment gateways expose an idempotency key as a header on POST requests. The client generates a unique key per logical request (typically a UUID), and the server stores (idempotency_key → response) for 24 hours. If the same key arrives twice, the server returns the cached response instead of re-executing. The client's retry policy is now safe — retrying a charge with the same idempotency key cannot produce two charges. The trick that makes this work is that the client generates the key before the first attempt and reuses it across all retries. If the client generates a fresh UUID per retry (a common bug), idempotency keys do nothing. PaySetu's payment SDK enforces idempotency-key reuse by binding the key to the merchant-side request object and refusing to let it change across retries. The pattern generalises far beyond payments — any external API call from a retrying client should accept an idempotency key, and any service-to-service POST inside the company should require one.
The transactional outbox — exactly-once across two systems without 2PC
When a service must atomically (a) write to its database and (b) emit a message to a broker, two-phase commit is one option but introduces all of 2PC's coordinator-failure pathologies. The transactional outbox pattern sidesteps 2PC: the service writes to its database and writes a row to an outbox table in the same database, all in one local transaction. A separate process — typically a CDC connector tailing the database's WAL — reads the outbox table and publishes to the broker. If the local transaction commits, both the state change and the outbox row are durable; if it aborts, neither is. The CDC connector achieves at-least-once publication, and the consumer dedups on the outbox row's UUID. PaySetu uses Debezium tailing Postgres's WAL into Kafka with outbox_id threaded into the Kafka headers; downstream consumers dedup on outbox_id in their database tables. The pattern delivers exactly-once across the database-and-broker boundary without any distributed transaction, at the cost of one extra table and one CDC connector. The original write-up is from Microservices.io; the technique is older and folklore-attributed to LinkedIn.
Why FLP impossibility lurks behind exactly-once too
The Two Generals proof rules out exactly-once delivery in synchronous networks; the FLP impossibility result (Fischer, Lynch, Paterson, 1985) rules out deterministic agreement among asynchronous processes when even one process can crash. Exactly-once processing across multiple systems is, structurally, an agreement problem — every participant must agree on whether the message has been processed. So FLP applies: in a fully asynchronous environment with even one fail-stop process, no protocol guarantees exactly-once processing in finite time. Real systems escape FLP by using timeouts (failure detection becomes possible at the cost of accepting some false positives) and by accepting probabilistic guarantees rather than deterministic ones. When somebody says "we have exactly-once", what they technically mean is "we have exactly-once with high probability under the synchrony assumptions our timeouts encode, and a recovery procedure for the cases where the timeouts lie". This is why the academically careful framing is exactly-once processing under the same failure assumptions as the rest of your system, not exactly-once unconditionally.
CricStream's commentary pipeline and why duplicates were unacceptable
CricStream broadcasts live commentary text to 25M concurrent viewers during a cricket final, with each ball generating an event ("six over deep mid-wicket"). Early on, the system was at-least-once with no dedup, which meant duplicate commentary appeared on viewers' phones every few minutes during a high-load match. The fix was a Kafka-EOS pipeline from match-feed to commentary-render, plus an idempotency-key column on the commentary-store table keyed by (match_id, ball_id). The pipeline now delivers exactly-once: even if the match-feed producer retries, even if a commentary-renderer crashes mid-render, the viewer's screen shows each ball exactly once. The engineering write-up that the team would publish if such a thing existed would emphasise that the cost was not the Kafka EOS configuration (a few config flags) but the discipline of threading (match_id, ball_id) through every layer of the system as the dedup key. The technique only works if the key is preserved end-to-end; one service that drops the key blows the guarantee for the entire chain.
The "exactly-once is impossible" Tyler Treat post and why it stuck
Tyler Treat's 2015 essay You Cannot Have Exactly-Once Delivery argued, correctly, that exactly-once delivery over an unreliable network is impossible, and quoted the Two Generals proof as the reason. The essay was widely read, widely misunderstood, and produced a generation of engineers who heard "exactly-once is impossible" and concluded "anybody who claims exactly-once is lying". The nuance — delivery is impossible but processing is achievable — was lost in the headline. Confluent's response, three years later, was a series of blog posts framing Kafka EOS in terms of processing, not delivery, and the framing has stabilised since. The takeaway: when somebody says exactly-once, ask "delivery or processing?" and the conversation immediately becomes productive instead of theological.
Choosing the dedup window — the constraint nobody writes down
The hardest engineering parameter in a dedup-table design is not the schema or the storage backend; it is the retention window. Set it too short and stale duplicates leak past the dedup boundary and corrupt state. Set it too long and the dedup table grows without bound and starts dominating storage and lookup cost. The honest derivation goes: identify every layer that can produce a delayed duplicate (broker retries, consumer crashes with offset rewind, manual replays, cross-region failovers), sum their worst-case windows, and add a safety multiplier of 2x. PaySetu's audit on this came out to 14 minutes from broker-level retries, 5 minutes from consumer crash and rebalance, 60 minutes from cross-region failover, and 24 hours from "operator manually replays a window of events" — so the dedup window for cross-region-aware paths is 48 hours, and the daily storage cost is treated as a line item. The principle generalises: the dedup window is the longest tail of every retry path that touches the data, not the average. Sizing for the average produces a system that is correct most of the time, which in this domain is indistinguishable from incorrect.
Why "at-most-once" still ships in production — telemetry and metrics
Not every workload wants at-least-once. Telemetry pipelines — application metrics, request traces, log lines — often choose at-most-once on purpose, because the cost of duplicate-handling exceeds the value of any one data point. AutoGo's metrics pipeline drops UDP datagrams under load rather than retry, and the dashboard tolerates the loss because per-second aggregates are still meaningful at 99% sample fidelity. The honest framing is that delivery semantics are a per-workload choice: payments demand exactly-once processing, audit logs demand at-least-once with dedup at query time, and metrics often demand at-most-once with smoothing. Treating "at-least-once everywhere" as the default is itself a design mistake when the workload genuinely tolerates loss; the retry budget is real cost.
Where this leads next
The next chapters in Part 15 cover the specific broker-side mechanics that make exactly-once processing feasible — Kafka's transactional API, consumer-group rebalancing, and the change-data-capture pattern that powers the transactional outbox.
If you came here from the queues-vs-streams chapter, the bridge is that exactly-once processing is dramatically easier on streams (offset-based) than on queues (ack-based), because the offset gives you an atomic commit point that the ack-based model lacks. The full chain is: /wiki/queues-vs-streams-the-fundamental-split → this chapter → broker-specific mechanics.
The deeper rabbit hole is event sourcing, where every state change is itself a message in a stream and the entire system is replayable. Event sourcing makes exactly-once processing structurally trivial — duplicates are just ignored on replay because the dedup key is the event's ID — but trades a different set of complexity around projection rebuilds and schema migrations. The chapter that picks that up is /wiki/event-sourcing-the-log-as-truth.
A practical adjacent topic is outbox-driven CDC pipelines: most production "exactly-once" architectures end up combining a transactional outbox in the source database with a CDC connector publishing to Kafka and idempotent dedup-keyed handlers on the receiving side. The chain composes three of the techniques in this chapter into one production-tested pipeline. See /wiki/cdc-debezium-and-the-wal for the connector mechanics.
References
- Tyler Treat, You Cannot Have Exactly-Once Delivery, Brave New Geek 2015 — the canonical "delivery is impossible" essay.
- Wang & Narkhede, Exactly-Once Semantics Are Possible: Here's How Kafka Does It, Confluent 2017 — the response framing EOS in terms of processing.
- Fischer, Lynch & Paterson, Impossibility of Distributed Consensus with One Faulty Process, JACM 1985 — the FLP impossibility result.
- Gray, Notes on Data Base Operating Systems, IBM 1978 — early formal treatment of transactional semantics.
- Microservices.io, Pattern: Transactional Outbox — the canonical write-up of the outbox pattern.
- Kafka KIP-98: Exactly Once Delivery and Transactional Messaging — the design document for Kafka EOS.
- /wiki/queues-vs-streams-the-fundamental-split — the previous chapter; why offset-based primitives make EOS tractable.
- /wiki/wall-atomic-broadcast-needs-ordering — the ordering primitive that EOS depends on inside the broker.