Note: Company names, engineers, incidents, numbers, and scaling scenarios in this article are hypothetical — even when they resemble real ones. See the full disclaimer.

At-least-once + idempotency in practice

PaySetu's payouts service was advertised internally as exactly-once for two years before anyone noticed it was running at-least-once with a Redis dedup table that had silently been evicting under memory pressure since the previous Diwali. The first signal was a merchant in Coimbatore raising a ticket because his ₹14,200 refund had landed twice — once on Wednesday at 02:11 and again on Wednesday at 02:13. Asha, the on-call SRE, pulled the request log and found that the same payment_id had been processed by two different consumer pods 90 seconds apart, after a producer retry on a brief network blip. The dedup key had been written to Redis the first time, but a maxmemory-policy of allkeys-lru had quietly evicted it 73 seconds later because a separate hot-key workload was hammering the same cluster. The architecture diagram still said exactly-once. The reality was at-least-once with a leaky dedup window. That gap — between the marketing word and the running system — is what this chapter is about.

This chapter is about the recipe that actually ships in production: at-least-once delivery plus idempotency. Not the theory of why exactly-once delivery is impossible (covered in /wiki/exactly-once-and-the-semantics-debate) — but the engineering: where the idempotency key comes from, what storage holds the dedup state, how long the retention window has to be, and which failure modes will eventually find you.

At-least-once + idempotency is the workhorse pattern for production messaging. The producer keeps retrying until it gets an ack; the consumer absorbs duplicates by keying every state mutation on a stable, message-derived ID. The two engineering decisions that decide whether you survive are where the idempotency key comes from (client-generated > broker-generated > content-hash) and how long the dedup state is retained (longer than the longest plausible duplicate window — typically 24h–7d). Get either wrong and you ship duplicates that arrive once a quarter and corrupt money.

What "at-least-once" actually means on the wire

At-least-once is not a configuration flag — it is the emergent behaviour of any sender that retries on timeout. The moment you write producer.send(msg, retries=3) or client.post(url, retry=Retry(total=5)), you are running at-least-once. The receiver may see the message zero, one, two, or more times, depending on where the failure landed.

There are five places a message can be duplicated on its journey from producer to consumer state, and a production-grade design has to deal with all five.

The five duplicate-injection points in an at-least-once pipelineA horizontal flow from producer through broker to consumer to database. Five points are marked where duplicates can be injected: producer-side retry, producer-broker network, broker replication, broker-consumer redelivery, and consumer-side post-mutation crash. Illustrative. producer retry on timeout broker replicates 3× consumer poll + handle database apply mutation 1 app-level retry after timeout 2 producer ack lost; resend 3 broker fail-over; replay log 4 consumer crash before commit 5 DB ack lost; consumer retries A duplicate can be injected at any of these five points. "At-least-once" means the consumer must absorb the duplicate with no observable change to state. Illustrative — not measured.
The five places a message can be duplicated between producer and database. Any one of them can fire alone; in production, two firing within the same week is normal. Illustrative.

Why each of these is a real, observed failure and not a paranoid edge case: point 1 fires whenever a TCP connection drops mid-write — the producer's send() raises, the application catches it, and the retry happens after the broker has already received and persisted the original. Point 2 is the same thing seen from the broker side — broker has the message, ack packet got lost, producer never saw the ack. Point 3 is what happens during a broker leadership change in Kafka or a replica promotion in RabbitMQ — the in-flight messages are re-delivered from the new leader's log. Point 4 is the most common in real outages — consumer pulled the message, processed it, was about to commit the offset, JVM hit a 4-second GC pause, broker session timeout fired, partition reassigned to another consumer, that consumer pulled the same message. Point 5 is the database equivalent — INSERT succeeded, the database's reply got lost, the application code threw, and on retry the row already exists. You cannot eliminate any of these without changing the underlying delivery semantics; you can only make the consumer absorb them.

Where the idempotency key comes from — and why it matters more than the retention

The single most important engineering decision in an at-least-once + idempotency system is where the idempotency key originates. Get it wrong and the dedup table will absorb some duplicates but not others, in ways that depend on which network hop dropped the packet — which is to say, in ways that are non-reproducible and devastating.

There are four common sources, in descending order of robustness.

Client-generated key (best). The originating client — a mobile app initiating a payment, a merchant POS terminal, a backend microservice that started the workflow — generates a UUID or content-derived hash and includes it as the idempotency key. The key travels with the message through every hop. Every retry — at every layer — uses the same key. The dedup window only has to cover the time between the first successful arrival and the last possible retry, which is finite and bounded. PaySetu's mobile SDK generates a v4 UUID per payment intent, stores it in the device's secure enclave keyed on the user's session, and re-uses it across app restarts within the same payment. A user who taps "Pay ₹2,500" twice in rapid succession produces two different keys (intentional retries are different intents). The phone losing network mid-request and the SDK retrying internally produces the same key. This is the only design where the dedup is end-to-end correct.

Per-hop key (acceptable for one hop). The producer near the broker generates the key (e.g. Kafka's enable.idempotence=true produces a (producer-id, sequence-number) per partition). This deduplicates inside the broker — two sends of the same (pid, seq) to the same partition are collapsed — but a producer restart generates a new pid, so the broker dedup window resets on every redeploy. The downstream consumer cannot rely on this key for cross-broker dedup; it has to layer its own. Per-hop keys are useful for what they cover (broker write idempotence) but not sufficient by themselves.

Broker-generated key (the offset). The broker assigns a unique offset to every record it persists. The consumer can use the offset as a dedup key. This works for consumer-side dedup of broker redelivery (point 4 in the diagram above) but does not cover producer-side duplicates (points 1, 2) — two producer retries that the broker accepts as separate writes will get different offsets and look like distinct messages. Offset-based dedup is necessary but insufficient.

Content hash (last resort). When no key was provided, hash the message body and dedup on the hash. Two identical payloads will collide and you will silently drop a legitimate second send (e.g. two ₹500 transfers from the same account in the same minute). Use this only when you can prove no two legitimate messages will ever have identical content — typically because the content includes a timestamp at sub-millisecond granularity, or a strictly-monotonic sequence number generated upstream. Content-hash dedup is a code smell; if you find yourself reaching for it, you have an upstream design problem.

Why client-generated keys are the only end-to-end correct option: every retry mechanism between the client and the final state-mutating system is a potential duplicate source. App-level retry, gRPC retry, Kafka producer retry, broker fail-over redelivery, consumer-pod restart, database connection retry — there are typically five to seven retry layers stacked on top of each other in a production payments flow. A key generated anywhere short of the originating client cannot deduplicate retries that happen earlier than its origin. A key generated at the database layer cannot deduplicate a Kafka producer retry that resulted in two records on the topic. A key generated at the consumer cannot deduplicate two producer retries that the broker accepted as separate writes. The client is the only point upstream of every retry in the chain, so only a key born there is guaranteed to mark the same logical operation across every duplicate path.

A code walkthrough — the dedup table that actually works

Here is a runnable, end-to-end demonstration of an at-least-once + idempotency consumer that handles all five duplicate-injection points. The code uses an in-memory dedup table for the demo (a real implementation would use the same database transaction or Redis with persistence) and applies state mutations only when the key is novel.

# at_least_once_consumer.py — runnable, demonstrates duplicate absorption.
import time, random, json
from collections import defaultdict

# Wallet state (the system of record).
balance = defaultdict(int)
# Dedup table — message_id -> (applied_at_unix, prior_balance_for_audit).
# In production: shared with the wallet DB via a single transaction, OR
# Redis with persistence + 24h TTL.
seen = {}
DEDUP_WINDOW_S = 24 * 3600  # 24 hours

def handle(msg, now_s):
    """Idempotent handler — safe to call N times for the same msg['id']."""
    mid = msg["id"]
    # Garbage-collect old dedup entries (cheap if window is bounded).
    expired = [k for k, (t, _) in seen.items() if now_s - t > DEDUP_WINDOW_S]
    for k in expired: del seen[k]

    if mid in seen:
        return ("duplicate-absorbed", balance[msg["acct"]])

    # Atomic in real life: same transaction as the balance UPDATE.
    seen[mid] = (now_s, balance[msg["acct"]])
    balance[msg["acct"]] += msg["amount"]
    return ("applied", balance[msg["acct"]])

# Simulate the wire: each message has a 30% chance of being delivered twice.
random.seed(7)
events = [
    {"id": "txn-001", "acct": "riya",  "amount": 1500},
    {"id": "txn-002", "acct": "rahul", "amount":  900},
    {"id": "txn-003", "acct": "riya",  "amount":  200},
    {"id": "txn-004", "acct": "asha",  "amount": 4500},
    {"id": "txn-005", "acct": "rahul", "amount":  100},
]
delivered = []
for e in events:
    delivered.append(e)
    if random.random() < 0.3:
        delivered.append(dict(e))  # duplicate on the wire

now = time.time()
print(f"messages on wire: {len(delivered)} (originals: 5)")
for m in delivered:
    status, bal = handle(m, now)
    print(f"  {m['id']:8s} acct={m['acct']:5s} amount={m['amount']:5d}"
          f"  -> {status:18s}  balance={bal}")

print("\nfinal balances:", dict(balance))
print("dedup table size:", len(seen))

Sample run:

messages on wire: 7 (originals: 5)
  txn-001  acct=riya  amount= 1500  -> applied             balance=1500
  txn-002  acct=rahul amount=  900  -> applied             balance=900
  txn-003  acct=riya  amount=  200  -> applied             balance=1700
  txn-003  acct=riya  amount=  200  -> duplicate-absorbed  balance=1700
  txn-004  acct=asha  amount= 4500  -> applied             balance=4500
  txn-005  acct=rahul amount=  100  -> applied             balance=1000
  txn-005  acct=rahul amount=  100  -> duplicate-absorbed  balance=1000

final balances: {'riya': 1700, 'rahul': 1000, 'asha': 4500}
dedup table size: 5

A walkthrough of the load-bearing lines:

  • if mid in seen: return ("duplicate-absorbed", ...) — this is the entire idempotency mechanism. Everything else is bookkeeping. The check must run before any state mutation, and the dedup-table write must be in the same transaction as the mutation (otherwise two consumers can both pass the check, both write the dedup row, and both apply the mutation).
  • seen[mid] = (now_s, balance[msg["acct"]]) — the dedup row records when the message was first applied and (optionally) the prior balance for audit. The audit field is what lets you reconstruct what would have happened without dedup — useful for reconciliation.
  • expired = [k for k, (t, _) in seen.items() if now_s - t > DEDUP_WINDOW_S] — the retention sweep. In production this is a TTL on the Redis key or a periodic DELETE FROM dedup WHERE created < NOW() - INTERVAL '24 hours'. Get this wrong and the table grows without bound.
  • if random.random() < 0.3: delivered.append(dict(e)) — the simulated duplicate. Real producers do not duplicate this often, but the rate is non-zero — for a Kafka producer with acks=all and retries=10, you should expect a baseline duplicate rate of ~0.1% during steady state and 1–5% during broker fail-overs.

Why the dedup table write and the state mutation must share a transaction: if they are in separate transactions, three concurrent consumers can all read mid not in seen, all decide to apply, then race on the writes. One wins on the dedup table, the others raise a primary-key conflict — but by then the state mutation may have already been applied (depending on which one your code does first). The clean pattern is INSERT INTO dedup_table (msg_id) VALUES (?) ON CONFLICT DO NOTHING RETURNING msg_id; if returning is empty, return; otherwise UPDATE wallet SET balance = balance + ? WHERE acct = ? — both inside one DB transaction, with the dedup INSERT first. The conflict on the dedup primary key is what serialises duplicate handlers; the wallet update only runs on the winner.

Retention windows — the parameter that owns your duplicate rate

The dedup table works only if its retention window is at least as long as the longest possible duplicate window. Get this wrong and the table is a placebo.

The longest plausible duplicate window is the maximum time between a message's first arrival and its last possible re-arrival. In a Kafka pipeline that bound is roughly: producer retry timeout (often delivery.timeout.ms = 120000 = 2 min) + broker replication lag (typically <1 s but can spike to 30 s during fail-over) + consumer session timeout (max.poll.interval.ms defaults to 5 min) + downstream retry budget (often hours). The dominating term is the downstream retry budget — if a consumer schedules a retry on the dead-letter queue with a 24-hour cron, the duplicate window is 24 hours.

The right retention window is at least 2× the longest plausible duplicate window. KapitalKite's order-routing service uses 24-hour Redis TTL because their failed-order replay job runs on a daily cron. CricStream's stream-processing pipeline uses a 7-day Postgres dedup table because their reconciliation batch runs weekly and may replay a week of failed events. PaySetu's payment processor uses a permanent dedup row in the same payments table as the payment row (the payment row is the dedup record — the INSERT ... ON CONFLICT DO NOTHING on the (idempotency_key) unique index serves both functions).

System Retention Storage Reason
PaySetu payments permanent Postgres unique index Money — never silently drop
KapitalKite orders 24 h Redis TTL Daily replay cron
CricStream events 7 d Postgres table Weekly reconciliation
MealRush notifications 1 h Redis TTL Notifications are best-effort
BharatBazaar inventory 6 h Redis TTL Inventory reconciles every 6h

Why "permanent" is sometimes the right answer for money: the cost of holding a dedup row forever is pennies — typically 100 bytes per payment for the key plus a metadata column. A bank that processes ten million payments a day adds ~1 GB of dedup metadata per day, ~365 GB per year, which fits on a single laptop SSD. The cost of getting the retention wrong even once is a duplicate ₹14,200 refund landing in a merchant's account — and at scale, dozens of these per quarter, all of which require manual reconciliation, customer-support tickets, and reputation damage. The economics overwhelmingly favour permanent retention for any state mutation involving money or legal record. Use TTL only for state where re-applying is genuinely cheap and idempotent in semantics (e.g. "user opened notification — increment a counter that's already approximate").

The four production failure modes — and how to spot each in dashboards

After two years of running this pattern at PaySetu / CricStream / KapitalKite scale, four failure modes account for nearly every duplicate that escapes the dedup layer.

Dedup-table check vs apply — the transaction boundary that mattersA flowchart with two columns. Left column: a correct design where the dedup INSERT and the state UPDATE are inside one transaction. Right column: a broken design where they are in two separate transactions, with the failure window between them highlighted in red. Illustrative. Correct — one transaction BEGIN TRANSACTION INSERT INTO dedup (mid) ... UPDATE wallet SET ... COMMIT — both or neither duplicate cannot escape Broken — two transactions TXN 1 — Redis SET dedup redis.set(mid, ttl=24h) crash window TXN 2 — Postgres UPDATE UPDATE wallet SET ...
The dedup INSERT and the state UPDATE must share a transaction. Putting them in two stores opens a crash window where one commits and the other does not — the canonical source of duplicates that escape "we have a dedup table" defenses. Illustrative.

1. Dedup-table eviction under memory pressure. Redis with maxmemory-policy set to anything other than noeviction will silently drop dedup entries to make room for hot keys. The first you learn of it is a customer ticket. The fix is noeviction policy (refuse new writes, alert on memory) plus a separate Redis instance for dedup — never share the dedup cluster with anything else. Spot it with a metric: redis_evicted_keys_total{instance="dedup"} > 0 is a page-immediately alert.

2. Idempotency-key skew across retry layers. The mobile app generates key=A, the API gateway wraps it in a request and adds its own request-id=B, and the downstream worker keys its dedup on B instead of A. A retry from the mobile app produces a new B but the same A — your dedup misses it. Spot it with: log the idempotency key at every hop and run a tracing query count distinct request-id per idempotency-key over 1 hour — anything > 1 is a leak.

3. Producer restart resets the dedup state. Kafka's idempotent producer (pid, seq) resets on every producer restart, so a producer that crashes and restarts mid-batch will resend with a new pid and the broker will accept it as a fresh write. The downstream consumer sees the application-level idempotency key (good), but the broker's own count of "messages produced" is inflated. Spot it with: monitor kafka_producer_record_send_total against application-level distinct-message-id count — divergence means producer restarts are minting duplicates.

4. Asymmetric retention between the dedup table and the source-of-truth. The dedup table has a 24-hour TTL, but the upstream system can replay failed messages from a 7-day dead-letter queue. On day 5 the DLQ replays — the dedup table has long since expired its entry — and the message is processed again. Spot it with: every dedup table must have a documented retention floor, and the floor must be at least 2× the longest upstream replay window. CI lint: parse the YAML config for both, fail the deploy if the floor is violated.

Common confusions

  • "Idempotency means the operation has no side effect." No — it means applying the operation twice has the same observable effect as applying it once. A wallet credit has a side effect (the balance changes), but a credit keyed by txn-id and gated by an idempotency check is idempotent: the second call observes the prior credit and returns "already applied".
  • "At-least-once + dedup is the same as exactly-once." From the user's point of view, yes. From the engineering point of view, no — exactly-once delivery is impossible (see /wiki/exactly-once-and-the-semantics-debate), so what you ship is exactly-once processing layered on top of at-least-once delivery. The distinction matters because every layer below your dedup point still sees the duplicates.
  • "You can dedup on request-id from the API gateway." Only if the gateway preserves the same request-id across retries — most don't. Generate the idempotency key at the originating client (mobile app, POS terminal, upstream service) and propagate it explicitly as a header.
  • "TCP guarantees no duplicates." TCP guarantees no duplicates within a single connection. Application-level retries open new connections, broker fail-overs send messages from new replicas, consumer pod restarts re-read from the broker — every one of these crosses the TCP boundary and TCP has no opinion.
  • "The dedup table can be eventually consistent." Only if your state mutation is also eventually consistent and commutative. For monetary operations, the dedup table must share a transaction with the state mutation — eventual consistency means a window where two consumers both see "not yet applied" and both apply.

Going deeper

Idempotent operations vs idempotent state — the design choice

There are two ways to make the consumer absorb duplicates. Idempotent operation: design the operation so applying it twice is naturally a no-op. UPSERT INTO inventory (sku, qty) VALUES ('SKU-9', 100) is idempotent — running it twice leaves qty=100. UPDATE inventory SET qty = qty + 100 WHERE sku = 'SKU-9' is not. Idempotent state: keep a dedup table and gate the operation on a key. This works for any operation, including non-idempotent ones like increments. Idempotent operations are cheaper (no extra storage) but require you to redesign the operation; idempotent state is universal but costs the dedup retention. Most production systems use a mix — increments and counters get dedup tables; configuration writes and snapshots use UPSERT.

Kafka's idempotent producer — what it actually buys

enable.idempotence=true in the Kafka producer turns on a (producer-id, sequence-number) mechanism per partition. The broker tracks the last accepted sequence number per (pid, partition) and rejects out-of-order or duplicate sequence numbers. This eliminates duplicates introduced by the producer's own retry loop (point 1 in the figure earlier) — but only within the lifetime of a single producer instance. A producer restart generates a new pid and the broker treats it as a fresh client. For end-to-end idempotency you still need an application-level key on top. The combination — Kafka idempotent producer plus application-level dedup at the consumer — is the standard production stack. See org.apache.kafka.clients.producer.internals.TransactionManager for the implementation.

Why the dedup table sits in the same database as the state

Distributed dedup tables (Redis, DynamoDB, an external KV) introduce a two-phase failure mode: the dedup write succeeds, the state mutation fails, the retry sees the dedup row and refuses to apply — the message is lost. Conversely, the state mutation succeeds, the dedup write fails, the retry re-applies — duplicate. The only way out is to put the dedup row and the state mutation in the same transaction. For SQL stores, this means an INSERT ... ON CONFLICT DO NOTHING RETURNING ... on a dedup table in the same DB. For NoSQL stores, this means a conditional update with the dedup key as a precondition. Cross-store dedup (Redis dedup + Postgres state) is the most common cause of the rare-but-real duplicate that escapes — every team that runs this pattern has a postmortem about it.

The Stripe idempotency-key API — the canonical public design

Stripe's API takes an Idempotency-Key HTTP header on every mutating request. The server maintains a 24-hour dedup window keyed on (api_key, idempotency_key). A retry within 24 hours with the same key returns the original response (status code, body, headers) byte-for-byte — even if the original failed. This is the shape every payment API converges on, and PaySetu's external API mirrors it because merchants expect it. The key engineering insight is that the response is also stored, not just the dedup row — a retry must return the same response the original returned, otherwise the merchant cannot tell whether their first attempt succeeded.

Reproduce this on your laptop

python3 -m venv .venv && source .venv/bin/activate
# No external deps — pure Python.
python3 at_least_once_consumer.py
# To see the eviction failure mode:
# - run Redis with maxmemory=10mb maxmemory-policy=allkeys-lru
# - load a hot-key workload, observe dedup keys getting evicted
docker run -d -p 6379:6379 redis:7 redis-server --maxmemory 10mb --maxmemory-policy allkeys-lru

Where this leads next

The next chapter in this part covers the transactional outbox pattern — the technique production services use to bind a database write and a message publish into one atomic step, so that the publish itself becomes idempotent at the source.

References