Idempotency keys

It is 11:48 pm and Riya is staring at a PaySetu support ticket: a customer paid ₹2,499 for a phone case, the merchant's app showed a spinner for nine seconds, then a red error, so the customer tapped Pay again — and now their bank statement shows two ₹2,499 debits 11 seconds apart. The merchant SDK retried automatically on a DEADLINE_EXCEEDED, the user retried manually, and somewhere in the network the original request actually succeeded after the SDK gave up. Three attempts went out. The payments service charged the card twice. The reason is not that PaySetu's engineers forgot retries are dangerous — they didn't. The reason is that retries without an idempotency key are dangerous, and on this code path the field was empty.

An idempotency key is a unique client-supplied identifier (typically a UUID) that the server stores alongside the result of executing a request. When the server sees the same key again, it does not re-execute — it returns the stored result. This is how at-least-once RPC delivery becomes effectively-once business semantics: the wire may deliver a request 1, 2, or 17 times, but the side effect (charging a card, creating an order, sending an SMS) happens once. The key, the durable store, and the conflict-resolution rule (200 vs 409 vs 422) are the three parts; missing any one breaks the property.

What "idempotent" actually means in production

An operation is idempotent when applying it twice has the same observable effect as applying it once. SET balance = 1000 is idempotent; balance = balance - 100 is not. In a single-process world this is a property of the operation. In a distributed system, where the network can deliver the same request 1, 2, or N times — and the receiver cannot distinguish "first delivery" from "redelivery" — idempotency is a property of the operation plus the receiver's memory of what it has already done.

The key insight is that most useful business operations are not naturally idempotent: charging a card, creating an order, transferring rupees, sending an SMS, dispatching a delivery rider. They mutate state, and applying them twice produces a different result from applying them once. The job of an idempotency key is to make a non-idempotent operation behave idempotently from the caller's point of view, by giving the receiver a way to recognise "I have already seen this request, here is the result I produced last time" without re-executing the side effect.

The same key on attempts 2 and 3 hits the dedup table before the card-network call. The card sees one authorisation; the client sees three identical 200 OK bodies. Illustrative — real PaySetu deduplication latency is ≈4 ms for the cache-hit path versus 380 ms p99 for the cache-miss (card-network) path.

The mental model is: the server keeps a small piece of paper with the key written on it and the result stamped beside it, and the first thing it does on every request is check that paper. Why the server, not the client, owns the dedup table: the client cannot know whether its previous attempt's bytes actually reached the server. From the client's perspective, a DEADLINE_EXCEEDED is indistinguishable from "the server processed the request and the response was lost" — the only authority that knows what actually happened is the server, and only by remembering keys can it tell you on retry.

Anatomy of a working idempotency key

A working idempotency key is not just a UUID in a header. It is a contract with three parts that must all hold simultaneously, or the property breaks.

Part 1: the key itself. Generated by the client, before the request is sent — typically uuid4() or a deterministic hash of (user_id, order_id, request_seq). It must be stable across retries: the SDK that retries the same request must send the same key. This is why frameworks that auto-retry (gRPC's built-in retry policy, AWS SDK retries, Stripe's idempotency-key header) generate the key once at the call site and re-send it byte-for-byte on every retry within that call.

Part 2: a durable dedup store. A row in a relational table, a Redis key with a TTL, a DynamoDB item — somewhere the server can write (key, request_hash, status, response_body, created_at) atomically and read it back later. The store must outlive a process restart; if the payments service crashes mid-charge and its dedup store is in process memory, the retry will not find the key and will charge again. PaySetu's payments-DB has a idempotency_keys table on the same Postgres primary that holds the charges table, so the dedup INSERT and the charge INSERT happen in the same transaction.

Part 3: a conflict-resolution rule. What does the server do when it sees a known key with a different request body? A naive implementation returns the cached response and moves on. A correct implementation hashes the request body (excluding non-deterministic fields like timestamps), stores the hash with the key, and on a key collision compares hashes. Same hash → return cached response. Different hash → return 409 Conflict or 422 Unprocessable Entity, because the client is reusing a key for a different operation, which is almost always a bug.

The pending state is the trickiest — a retry that arrives while the original request is still mid-flight must not start a second side effect. Returning 409 (or blocking briefly) lets the original finish and stamp completed-ok before the retry sees the result.

The pending state is what most homemade implementations get wrong. Why pending matters: imagine the merchant SDK has a 200 ms timeout but the card-network call takes 380 ms. The SDK times out and retries at t=200 ms. The first attempt is still in flight on the server — the dedup table has status=pending for this key. If the server treats pending as absent and starts a second card-network call, you have a race: both calls might succeed and you charge twice. Treating pending as a hard 409 (or briefly blocking the retry until the original completes) is what makes the dedup safe under concurrent retries.

Code: a deduping payments endpoint with conflict detection

This is the smallest faithful implementation: an SQLite-backed dedup table, atomic insert-or-fetch via a UNIQUE constraint on the key, request-body hashing for collision detection, and the four-state lookup logic in one function.

# idempotency.py — atomic dedup with conflict detection
import hashlib, json, sqlite3, time, uuid
from contextlib import contextmanager

DB = sqlite3.connect(":memory:", isolation_level=None)
DB.execute("""
CREATE TABLE idem (
  key TEXT PRIMARY KEY,
  req_hash TEXT NOT NULL,
  status TEXT NOT NULL,           -- pending | ok | err
  response_body TEXT,
  created_at REAL NOT NULL
)""")

def req_hash(body: dict) -> str:
    canon = json.dumps(body, sort_keys=True, separators=(",", ":")).encode()
    return hashlib.sha256(canon).hexdigest()[:16]

@contextmanager
def tx():
    DB.execute("BEGIN IMMEDIATE")  # serialise writers
    try:
        yield
        DB.execute("COMMIT")
    except:
        DB.execute("ROLLBACK"); raise

def charge(key: str, body: dict) -> tuple[int, dict]:
    h = req_hash(body)
    with tx():
        row = DB.execute("SELECT req_hash, status, response_body FROM idem "
                         "WHERE key = ?", (key,)).fetchone()
        if row is None:
            DB.execute("INSERT INTO idem VALUES (?,?,?,?,?)",
                       (key, h, "pending", None, time.time()))
            existing_status = None
        else:
            stored_hash, status, resp = row
            if stored_hash != h:
                return 409, {"error": "key reused with different body"}
            if status == "pending":
                return 409, {"error": "in flight; retry shortly"}
            return 200, json.loads(resp)         # cache hit, no side effect
        existing_status = "started"

    # ---- side effect runs OUTSIDE the dedup transaction ----
    # (simulate card-network call that succeeds)
    auth = {"auth_id": "A" + uuid.uuid4().hex[:6], "amount": body["amount"]}

    with tx():
        DB.execute("UPDATE idem SET status=?, response_body=? WHERE key=?",
                   ("ok", json.dumps(auth), key))
    return 200, auth

# ---- exercise ----
key = "k7e21f9c"
print("attempt 1:", charge(key, {"amount": 2499, "card": "4111"}))
print("attempt 2:", charge(key, {"amount": 2499, "card": "4111"}))   # retry, same body
print("attempt 3:", charge(key, {"amount": 9999, "card": "4111"}))   # same key, diff body
print("attempt 4:", charge("k_other", {"amount": 2499, "card": "4111"}))
print("\nrows in dedup store:")
for r in DB.execute("SELECT key, req_hash, status, substr(response_body,1,40) FROM idem"):
    print(" ", r)

Sample run:

attempt 1: (200, {'auth_id': 'A4b9c1e', 'amount': 2499})
attempt 2: (200, {'auth_id': 'A4b9c1e', 'amount': 2499})
attempt 3: (409, {'error': 'key reused with different body'})
attempt 4: (200, {'auth_id': 'A77fe23', 'amount': 2499})

rows in dedup store:
  ('k7e21f9c', 'a3f9...c104', 'ok', '{"auth_id": "A4b9c1e", "amount": 2499}')
  ('k_other',  'a3f9...c104', 'ok', '{"auth_id": "A77fe23", "amount": 2499}')

Walkthrough. The line DB.execute("BEGIN IMMEDIATE") is what serialises concurrent retries: SQLite (and Postgres with SELECT ... FOR UPDATE or INSERT ... ON CONFLICT) gives you an atomic check-and-insert for the key, so two retries arriving in the same millisecond cannot both win the "first" slot. The line if stored_hash != h: return 409 is the conflict detector — same key, different body means the client has a bug (or worse, a key collision); silently re-running the cached response would be wrong because the cached response describes a different operation. The line # side effect runs OUTSIDE the dedup transaction is critical: if the card-network call held the database transaction open for 380 ms, every other write to the idem table would block. The dedup-INSERT commits in milliseconds; the side effect runs against the now-pending row; the UPDATE that flips it to ok runs in a fresh small transaction.

Why hashing the body: without req_hash, attempt 3 in the run above (same key, ₹9999 instead of ₹2499) would have returned the cached ₹2499 response, and the user would believe their ₹9999 charge succeeded when in fact nothing happened. The hash is the integrity fence that turns "I have seen this key" into "I have seen this exact request before". Stripe's API does this; so does AWS's SDK retry layer; PaySetu's internal RPC framework hashes a deterministic subset of request fields (amount, currency, merchant_id, intent_id) and excludes timestamps and trace IDs. Why a TTL on the dedup table matters: the entries cannot live forever or the table grows unbounded. Typical retention is 24 hours — long enough that any reasonable retry has either succeeded or been abandoned, short enough that the table stays small. After expiry, the same key arriving again is treated as a fresh request, which is correct because no real client retries 24 hours later for a transient network failure.

Where idempotency keys actually live in production

The header name varies but the contract is the same. Stripe ships an Idempotency-Key header with every mutating request and stores the dedup entry for 24 hours. AWS API Gateway and Lambda use X-Amzn-RequestId plus per-service idempotency tokens (ClientToken for EC2, idempotencyToken for Step Functions). gRPC carries it as a metadata key by convention (x-idempotency-key). PaySetu uses an X-Idempotency-Key header on every /v1/payments/* write, and the gateway will reject any POST to those paths that arrives without one.

The trickier production decision is who generates the key. Three patterns:

Client-generated UUID per call. The merchant SDK creates a fresh uuid4() for each logical user action and reuses it across retries of that action. This is what Stripe recommends for external API consumers. Drawback: the client has to know what "the same logical action" means; double-tapping Pay is two actions to a careless SDK.
Deterministic key derived from business identifiers. key = sha256(f"{merchant_id}:{order_id}:{attempt_seq}"). The merchant cannot accidentally generate a fresh key for a retry because the inputs are fixed. This is the pattern PlayDream uses for fantasy-team submissions during the toss spike: the contest_id + user_id + entry_seq is the dedup key, and any duplicate submission for the same triple is rejected at the gateway.
Server-issued nonces via a pre-flight endpoint. Client calls POST /idempotency-tokens to get a token, then attaches it to the actual mutating call. Adds an RTT but lets the server enforce token uniqueness centrally. RailWala uses this for Tatkal-window booking attempts — the booking-attempt-token is issued by the booking service before the user fills in passenger details, so even if the user's app retries the submit, the same token comes back.

Each pattern handles "the same logical action" differently; pattern 2 is the most foolproof because it removes client memory from the equation, but it requires that all the inputs to the hash actually exist before the request is sent.

Common confusions

"Idempotent operations don't need idempotency keys." True for naturally-idempotent operations (PUT /resource/123 with a full replacement body, DELETE /resource/123). False for the operations that actually drive business: POST /charges, POST /orders, POST /sms. A DELETE retried twice is fine; a POST /charges retried twice without a key bills the customer twice. The header is for the operations whose semantics are not idempotent at the application layer, even if the verb HTTP-says they should be.
"At-most-once delivery solves this." At-most-once means "the network may drop your message, but if it arrives it arrives once". The application still sees retries — your own SDK retries on timeout, your user retries by re-tapping. Network-layer at-most-once does nothing about application-layer or user-layer retries, which is where double-charges actually originate.
"exactly-once delivery is what idempotency keys give you." No — keys give you idempotent processing on top of at-least-once delivery. The wire still delivers N times; the receiver makes the side effect happen once. The two-generals problem proves you cannot have true exactly-once delivery; idempotency keys are the practical workaround that gives the same observable property at the application layer. See RPC semantics: at-most-once, at-least-once, exactly-once.
"A UUID in a Redis SET with NX is good enough." It is good enough if the side effect is contained inside the same Redis call. It is not good enough if the side effect (charging a card, calling a third-party API, writing to a different database) can succeed after the key is set but before the response body is stored. The retry that arrives during that window will see the key, return success, but the original might still fail and your accounting is now wrong. The dedup store and the side-effect outcome must be linked atomically — typically by storing both in the same transactional database, or by using the dedup store as a state machine that the side effect drives forward.
"Idempotency keys make distributed transactions unnecessary." They make retries safe. They do not make a transfer between two services atomic. Transferring ₹500 from PaySetu wallet to PaisaCard rewards still requires either a saga (with its own per-step idempotency keys for the compensations) or a 2PC; the idempotency key prevents the first leg from being charged twice on retry, but it does not prevent the leg from succeeding while the second leg fails.
"Stale dedup entries should be kept forever for safety." No. A retry 30 days later is not a retry — it is a new request that happens to share a key, almost certainly because the client regenerated the same key by accident. Keeping entries forever bloats the table and creates surprising 200-OK-with-no-side-effect outcomes long after the original work has been forgotten. 24 hours is the industry-standard TTL; Stripe, AWS, and Cloudflare all use a 24h window.

Going deeper

The transactional outbox pattern as the dual

When the side effect is itself a write to another service (Kafka publish, downstream gRPC call), the idempotency-key dedup row and the outbound message must be inserted in the same database transaction — otherwise you are back to the same dual-write problem the key was meant to solve. The transactional-outbox pattern stores the outbound message in a local outbox table inside the same transaction as the dedup INSERT and the business write. A separate poller reads outbox and ships rows to Kafka with at-least-once semantics; the consumer on the other side has its own idempotency key (often the outbox row id) and dedups on receipt. This is how PaySetu's payment-status-event stream stays consistent with the payments table even when Kafka has a 3-second blip.

Postgres `INSERT ... ON CONFLICT` versus `SELECT FOR UPDATE`

Both patterns work but have different concurrency profiles. INSERT ... ON CONFLICT (key) DO NOTHING RETURNING xmax is a single round-trip — it tells you whether you were the inserter or whether the row already existed. SELECT ... FOR UPDATE followed by an INSERT is two round-trips and holds the row lock longer. For a hot key (the same key arriving from many retries), ON CONFLICT is materially faster because it never has to escalate to a row-level lock conflict. The downside is you cannot also read the existing row's status atomically in the same statement — you need a follow-up SELECT. Stripe's blog post on idempotency keys (cited below) describes their internal split: ON CONFLICT on the hot path, SELECT FOR UPDATE on the conflict-resolution path.

Where the key cannot be the only defence: the trusted-input boundary

Idempotency keys are generated by the client. A malicious client can generate one fresh key per retry, defeating the dedup. This is why payment systems layer idempotency keys underneath a velocity-limit check: the gateway tracks per-card per-merchant request rates, and even a unique-key flood gets blocked by the velocity limiter. The key handles benign retries; the rate limiter handles adversarial retries. Both must exist; neither is sufficient alone. CricStream's pay-per-view checkout had an outage in 2024 when a misbehaving SDK generated a fresh UUID on every retry — the dedup table was not the line of defence that saved them; the per-card velocity limit at the card-network gateway was.

Reproduce this on your laptop

python3 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
python3 idempotency.py

# To inspect a real Stripe-style header on the wire:
curl -H "Idempotency-Key: $(uuidgen)" \
     -H "Content-Type: application/json" \
     -d '{"amount": 2499, "currency": "inr"}' \
     -X POST https://httpbin.org/post | jq '.headers'

Where this leads next

Idempotency keys are the bottom layer of safe retries. The layers above them, in this curriculum:

Retries and exponential backoff — how a client should retry once an idempotency key makes it safe, including jitter and the budget arithmetic that prevents retry storms.
RPC semantics: at-most-once, at-least-once, exactly-once — the formal classification that idempotency keys live inside; "effectively-once business semantics on top of at-least-once delivery" is the precise claim.
Deadlines and deadline propagation — a retry that arrives after the deadline has expired must be refused even if the key has never been seen; the deadline is what bounds how long the dedup entry's pending state is allowed to last.

Beyond Part 4, idempotency keys appear again in Part 14 (saga compensations need their own per-step keys so a re-issued compensation does not double-refund), in Part 15 (Kafka consumers track committed offsets, but the processing of each message must be idempotent or the at-least-once delivery becomes at-least-twice in user-visible state), and in Part 16 (Temporal activities have a built-in activity_id that functions as an idempotency key across worker restarts).

References

"Designing robust and predictable APIs with idempotency" — Brandur Leach, Stripe Engineering, 2017 — the canonical write-up of how Stripe implements Idempotency-Key, including the request-hash check and the 24-hour retention.
"Implementing Stripe-like idempotency keys in Postgres" — Brandur Leach, 2017 — implementation deep-dive with the actual schema, locking strategy, and recovery semantics for crashes mid-side-effect.
"Making retries safe with idempotent APIs" — AWS Architecture Blog — Amazon's guidance on idempotency tokens across AWS services, with discussion of when server-issued vs client-generated tokens are appropriate.
"Pat Helland — Life Beyond Distributed Transactions: an Apostate's Opinion" (CIDR 2007) — the foundational paper arguing that durable, dedupable activities replace distributed transactions in scaled systems; idempotency keys are the practical embodiment.
RFC 7231 §4.2.2 — Idempotent Methods, IETF — the HTTP-spec definition of idempotency at the protocol level, and why POST is intentionally outside the set.
"Exactly-Once Semantics Are Possible: Here's How Kafka Does It" — Confluent, 2017 — how a distributed log layers producer-id + sequence-number (effectively idempotency keys) with transactional commits to give effectively-once stream processing.
RPC semantics: at-most-once, at-least-once, exactly-once — internal companion. The formal semantic classification that idempotency keys live inside.
Deadlines and deadline propagation — internal companion. The deadline is what bounds how long the dedup entry's pending state may last; without it, a stuck side effect blocks every retry forever.