RPC semantics: at-most-once, at-least-once, exactly-once

It is 14:07 IST on a Saturday and Riya, on-call for PaySetu's payouts pipeline, is staring at a row in the merchant-ledger database that says merchant M-0048213 was credited ₹47,200 twice for the same UPI transaction reference. The payouts-router emitted one credit_merchant RPC. The merchant-balance service has the row stamped at 14:01:23.471 and 14:01:23.682 — 211 ms apart, same amount, same reference, two separate ledger entries. The router's logs show a single retry: attempt=1 deadline_exceeded, then attempt=2 ok. The first call did not time out at the balance service — it succeeded, the response just never came back. The router treated the silence as failure, retried, and the balance service, having no memory of the first call, credited the merchant a second time. Riya now owes Finance an explanation for ₹47,200 that exists in two places.

RPC semantics describe what the server promises about how many times a request executes when the network drops a message. At-most-once means zero-or-one (no retries — lose the request, lose the operation). At-least-once means one-or-more (retries — but duplicates are possible). Exactly-once is impossible at the wire because the client cannot distinguish "request lost" from "response lost", but is achievable at the application layer with at-least-once delivery plus a server-side dedup table keyed by an idempotency key. This chapter is about which semantic to choose, what each costs, and how dedup tables actually work in production.

What each semantic actually claims

The three semantics are not three options on a spectrum — they are three different promises the server is making about how many times the operation executed when the network behaves badly. The difference matters because the network has exactly four failure modes for any RPC, and the semantic determines which of those four the client and server collude to handle.

Illustrative — the three semantics across the four network failure modes. The rightmost column ("crash post-reply") is the semantic killer: server did the work, client never got the answer, and the only honest fix is for the server to remember it already did the work.

Why these four failure modes are exhaustive: every RPC is two messages — request out, response back — separated by server-side work. Either message can be lost, and the server work happens at some point in between. So the failure axes are (request reached server: yes/no) × (response reached client: yes/no) × (server completed work: before crash / after crash). The four cases above cover the interesting cells; the trivial "everything worked" case is not a failure mode.

The asymmetry that breaks naive intuition is that the client cannot tell request lost from response lost. From the client's perspective both look identical — a deadline expires with no reply. Yet from a correctness standpoint these are opposite situations: in the first, the operation never happened (safe to retry); in the second, it did happen (retrying duplicates it). Because the client cannot distinguish them, the choice of semantic is really the choice of which side eats the cost of ambiguity.

At-most-once: the simplest, almost never what you want

At-most-once says: send the request, accept any reply that arrives, and if no reply arrives within the deadline, declare failure and do not retry. The operation either ran zero times or one time — never two. The cost is simple: every transient network blip becomes a user-visible failure, and the failure rate is exactly the network's per-RPC loss rate. For the 0.4% per-segment loss that PaySetu measured in the previous chapter on a 14-segment RPC, that is roughly 5.4% of calls failing as the user-visible error rate.

At-most-once is correct for non-idempotent operations whose duplicate cost is higher than their failure cost. The classic example used to be UDP-based DNS — but DNS is actually idempotent (a query has no side-effects), so DNS retries freely. The genuinely at-most-once cases are narrow: telemetry pings where missing one out of 250 is fine, fire-and-forget audit log writes that have a separate reconciliation pass, push notifications where a duplicate would annoy the user but a missing one is the lesser evil. For payments, ledger entries, order placement, sending money — at-most-once is wrong. It exchanges a small probability of correctness violation (duplicate charge) for a guaranteed user-visible failure rate of several percent. The trade is upside-down.

The other place at-most-once shows up — usually unintentionally — is when developers turn off retries because retries "caused incidents". This is the wrong fix. The incident was caused by retries on non-idempotent operations; the answer is to make the operations idempotent, not to disable retries.

At-least-once: the production default

At-least-once says: send the request, retry on failure (with backoff), and consider the call complete when some attempt returns success. The operation runs one or more times. Every distributed-systems infrastructure you actually use in production — Kafka producers, gRPC with retryPolicy, AWS SQS, Postgres replication — defaults to at-least-once. The reason is fallacy 1 (the network is not reliable): if you do not retry, you lose work; if you do retry, you may double-deliver; and double-delivery, while painful, is at least recoverable if the receiver is idempotent.

The duplicate-rate under at-least-once is exactly the rate of "response lost" failures (the rightmost column in the SVG). Empirically this is dominated by two causes: (1) the response was emitted but a TCP segment in the response path was dropped and the deadline expired before retransmit, and (2) the server completed the operation, then crashed before its reply made it to the wire. Both look identical to the client; both produce a retry; the retry produces a duplicate.

The discipline that makes at-least-once tolerable is idempotency at the receiver. An operation f is idempotent if f(x) and f(x); f(x) produce the same final state. set_balance(account, 1000) is idempotent. add_to_balance(account, 100) is not. The transformation from non-idempotent to idempotent is usually one of three patterns:

State-replacing operations (PUT-style): the operation specifies the desired final state, not the delta. Replaying produces the same state.
Idempotency keys: every operation carries a stable client-generated key. The server records "already processed key K → result R" in a dedup table. A retry with the same key is detected and the original result is returned without re-executing.
Conditional updates (compare-and-swap): the operation specifies a precondition (e.g. the row's version must equal 7). The first execution succeeds and bumps the version to 8; the retry fails the precondition and is rejected as a no-op. Common in Mongo's findAndModify, in DynamoDB's ConditionExpression, in any optimistic-concurrency-control table.

PaySetu's bug at the top of this chapter was: at-least-once retries (good) on a non-idempotent server (bad). The fix is the second pattern — an idempotency key carried in every payouts RPC and a dedup_table(key, response_blob, created_at) table at the receiver.

Exactly-once is a marketing word at the wire — but achievable at the application

The phrase "exactly-once delivery" at the network layer is not achievable. The argument is short: the client cannot distinguish "request lost" from "response lost". To recover from "request lost" the client must retry. But "response lost" looks identical, so the client may retry an already-delivered message. The only ways to break this symmetry require either an additional message (which itself can be lost — recurses) or the client and server agreeing on durable state about what was delivered (which is the dedup-table approach, and is what people actually mean by "exactly-once").

What is achievable is exactly-once effect: the operation's effect on the system's state happens exactly once, even though the wire-level message may be delivered multiple times. The recipe is:

Client generates a stable idempotency key before sending the first attempt. This key is not derived from anything random per-attempt — it identifies the user-intended operation, not the wire request. PaySetu uses <merchant_id>:<utr>:<settlement_date> (the bank's UTR is the natural idempotency key for any UPI-derived operation).
Server checks the dedup table before executing. If the key is present and the previous execution succeeded, the server replays the stored response without re-executing the operation. The semantics from the client's perspective are: "I asked once, you answered once, and the second attempt produced the same answer."
Server's dedup-table write and the operation's state change are atomic — usually in the same transaction. This is the load-bearing detail. If you record "key K processed" before completing the operation, a crash between the two leaves the system claiming the operation ran while no state changed. If you record "key K processed" after, a crash between the two leaves the operation done but the dedup state missing — and the retry duplicates.

The framework Kafka calls "exactly-once semantics" (EOS) is this pattern at scale: producer-side idempotency (sequence numbers per partition + producer ID) plus consumer-side transactional commits (offset commit and downstream write in one Kafka transaction). It is not magic — it is at-least-once delivery wrapped in dedup state at every boundary. The "cost" of EOS is the dedup-state storage and the transactional coordination, both real.

Why "atomic with the operation" is the part most implementations get wrong: a dedup table is a piece of state, the operation changes another piece of state, and atomicity across two pieces of state is the same problem as a distributed transaction. Either both states live in the same database (use a SQL transaction — easy), or you use a write-once log where the "operation outcome + dedup record" is one durable entry that is either applied or not (Kafka transactions, Spanner's transactional writes). What does not work: a Redis-based dedup table with the operation in a separate Postgres database. A crash between the two writes loses correctness exactly the way you were trying to prevent.

A runnable demonstration — at-least-once + dedup table

The following script is a self-contained exactly-once-effect server. The server simulates a flaky network (15% chance of dropping the response on the way back) and a client that retries on timeout. The dedup table is in-memory for clarity; in production it would be a SQL row with a unique constraint on the idempotency key, or a DynamoDB item with attribute_not_exists condition.

# rpc_dedup_demo.py — at-least-once delivery + server-side dedup = exactly-once effect
import random
import time
from dataclasses import dataclass, field
from typing import Optional

random.seed(7)

@dataclass
class Server:
    """Merchant-balance service. Holds a balance and a dedup table."""
    balance: int = 0
    dedup: dict = field(default_factory=dict)  # key -> stored response
    response_drop_rate: float = 0.15

    def credit(self, idem_key: str, amount: int) -> Optional[dict]:
        # 1. Check dedup first — if seen, replay the stored answer
        if idem_key in self.dedup:
            print(f"  [server] dedup hit for {idem_key}, replaying response")
            response = self.dedup[idem_key]
        else:
            # 2. Apply the operation atomically with the dedup write
            self.balance += amount
            response = {"ok": True, "new_balance": self.balance, "idem_key": idem_key}
            self.dedup[idem_key] = response  # atomic with the balance update in real DB
            print(f"  [server] applied credit {amount}, balance now {self.balance}")
        # 3. Simulate the network: drop the response with some probability
        if random.random() < self.response_drop_rate:
            print(f"  [server] response DROPPED on the wire for {idem_key}")
            return None
        return response


def client_credit(server: Server, idem_key: str, amount: int, max_attempts: int = 5) -> dict:
    """Client retries on timeout (None response) with the *same* idem_key."""
    for attempt in range(1, max_attempts + 1):
        print(f"[client] attempt {attempt} for {idem_key} amount={amount}")
        response = server.credit(idem_key, amount)
        if response is not None:
            print(f"[client] got response: {response}")
            return response
        print(f"[client] timeout, will retry")
        time.sleep(0.05)  # backoff
    raise RuntimeError(f"exhausted retries for {idem_key}")


# Simulate 5 distinct user-intended credits, each retried until a response arrives
server = Server()
for utr in ["UTR-1001", "UTR-1002", "UTR-1003", "UTR-1004", "UTR-1005"]:
    print(f"\n=== Operation {utr} ===")
    client_credit(server, idem_key=utr, amount=1000)

print(f"\nFinal balance: {server.balance}")
print(f"Dedup-table size: {len(server.dedup)}")
print(f"Expected balance (5 ops × 1000): 5000")

Sample run:

=== Operation UTR-1001 ===
[client] attempt 1 for UTR-1001 amount=1000
  [server] applied credit 1000, balance now 1000
[client] got response: {'ok': True, 'new_balance': 1000, 'idem_key': 'UTR-1001'}

=== Operation UTR-1002 ===
[client] attempt 1 for UTR-1002 amount=1000
  [server] applied credit 1000, balance now 2000
  [server] response DROPPED on the wire for UTR-1002
[client] timeout, will retry
[client] attempt 2 for UTR-1002 amount=1000
  [server] dedup hit for UTR-1002, replaying response
[client] got response: {'ok': True, 'new_balance': 2000, 'idem_key': 'UTR-1002'}

=== Operation UTR-1003 ===
[client] attempt 1 for UTR-1003 amount=1000
  [server] applied credit 1000, balance now 3000
[client] got response: {'ok': True, 'new_balance': 3000, 'idem_key': 'UTR-1003'}

=== Operation UTR-1004 ===
[client] attempt 1 for UTR-1004 amount=1000
  [server] applied credit 1000, balance now 4000
  [server] response DROPPED on the wire for UTR-1004
[client] timeout, will retry
[client] attempt 2 for UTR-1004 amount=1000
  [server] dedup hit for UTR-1004, replaying response
[client] got response: {'ok': True, 'new_balance': 4000, 'idem_key': 'UTR-1004'}

=== Operation UTR-1005 ===
[client] attempt 1 for UTR-1005 amount=1000
  [server] applied credit 1000, balance now 5000
[client] got response: {'ok': True, 'new_balance': 5000, 'idem_key': 'UTR-1005'}

Final balance: 5000
Dedup-table size: 5
Expected balance (5 ops × 1000): 5000

Two of the five RPCs (UTR-1002, UTR-1004) had their response dropped, the client retried, and the server's dedup table caught the duplicate — replaying the stored response without re-executing the credit. The final balance is exactly ₹5,000 (5 operations × ₹1,000), not ₹7,000 (which is what naive at-least-once without dedup would have produced — two duplicates × ₹1,000 each). The load-bearing line is if idem_key in self.dedup: response = self.dedup[idem_key] — the server's commitment that any given idempotency key produces at most one balance change.

Why a stored response and not just a "key seen" boolean: the retry needs the same answer the original got. If the original returned {"new_balance": 2000} and the retry got back {"already processed"}, the client cannot reconstruct the merchant's balance. Replaying the stored response makes the retry transparent to the client — it sees what it would have seen if the first response had not been dropped. This is what "exactly-once effect" means from the application's point of view.

A second production tale — KapitalKite's order-placement deduplication

KapitalKite is a discount stockbroker; its order-placement RPC carries a user's BUY/SELL intent from the mobile app to the order-management system (OMS), which forwards to the exchange. In 2023 the team migrated the order-placement path from at-most-once (with a user-visible "order failed, please retry" toast) to at-least-once with idempotency keys. The motivation was that during high-volatility minutes — say, the first 30 seconds after a results announcement when share price was moving 4% per second — the previous failure rate of 0.7% on the order-placement RPC was producing 1,400 user-visible failures per second across 200,000 concurrent orders. Users would tap retry, and roughly 20% of the time would tap retry on an order that had actually placed (the "response lost" case), producing a second order at a worse price. Customer-support ticket volume during volatile windows was dominated by "I bought twice".

The fix was an idempotency key generated client-side on the mobile app (a UUIDv7 stamped at the moment the user tapped Buy), carried through every retry of the order-placement RPC, and stored in a Postgres oms_dedup table with a unique index on (user_id, idem_key). The OMS would INSERT ... ON CONFLICT (user_id, idem_key) DO NOTHING RETURNING id; if id came back the order was new and was forwarded to the exchange; if no rows came back the order had been seen before and the OMS replayed the stored response. The transactional invariant — oms_orders row written + oms_dedup row written, both in the same Postgres transaction — meant that a crash anywhere in the path was safe: either both rows landed (operation done) or neither did (retry placed it cleanly).

After the rollout, the user-visible failure rate dropped from 0.7% to 0.02%, and "I bought twice" tickets dropped to roughly zero. The unexpected secondary benefit: because the dedup table contained the original response, the mobile app could surface the order's outcome (price filled, partial fill, exchange-reject reason) on the retry even when the original response had been lost — a UX win that was not on the roadmap. Total engineering cost: about 3 weeks for two engineers, including the Postgres schema change, the client-side UUIDv7 generation, the migration of existing in-flight requests, and the runbook update.

Illustrative — KapitalKite's order-placement flow before and after the idempotency-key rollout. The wire-level retries are the same; what changed is that the OMS now remembers which user-intended operations it has already processed.

Common confusions

"At-least-once and exactly-once are different delivery guarantees." No. Exactly-once delivery at the wire is impossible. Exactly-once effect on application state is at-least-once delivery plus a server-side dedup table. The two phrases describe the same wire behaviour with different application-layer state.
"gRPC's retryPolicy gives me exactly-once." No. gRPC retries at the transport layer; it has no idea what your operation does. If your operation is non-idempotent, retries duplicate the operation. The right reading: gRPC retries handle transport unreliability; your application must handle operation-level idempotency via stable client-generated keys and a server-side dedup table. Read the gRPC retryPolicy docs carefully — they explicitly say retries only happen on RPCs declared idempotent.
"An idempotency key is the same as a request ID." A request ID changes per attempt — every retry gets a new ID for tracing. An idempotency key is stable across retries — every retry of the same user-intended operation carries the same key. Confusing the two means your dedup table sees N distinct keys for N retries and accepts every one as a fresh operation.
"I'll just hash the request body for the idempotency key." Tempting, but two distinct user-intended operations may have identical bodies (Riya places a ₹1,000 BUY at 14:01, then another ₹1,000 BUY at 14:01:02 — both legitimate). Body-hashing collapses these two into one. The idempotency key must come from the user's intent (a UUIDv7 stamped when they tapped Buy), not from the bytes on the wire.
"Kafka's exactly-once semantics is end-to-end exactly-once." Kafka's EOS gives exactly-once between Kafka and Kafka — producer-to-broker (idempotent producer with sequence numbers) plus consumer-to-broker (transactional offset commit). End-to-end exactly-once across your application is still your problem: the consumer must perform its downstream side-effect and commit the Kafka offset in one atomic step, which requires either the consumer's downstream store participating in Kafka's transaction (rare) or the consumer implementing its own idempotency on the downstream store.
"Idempotency keys belong in the application; the framework cannot do it for me." The framework can do half. Stripe-style idempotency-key headers, AWS SDK retry middleware, and gRPC interceptors can carry the key around for you. What the framework cannot do is decide what counts as one operation from the user's perspective — that is application semantics, and only your domain code knows it.

Going deeper

The "at-least-once + dedup = exactly-once" pattern, formalised

Saltzer, Reed, and Clark's end-to-end argument (1984) is the foundational reasoning here. The argument is: a property that the application requires must be implemented at the application layer, because the lower layers cannot know enough to implement it correctly. Exactly-once-effect is the canonical example. The network layer can guarantee best-effort delivery; the transport layer (TCP) can guarantee in-order, reliable byte-stream within a connection's lifetime; neither layer knows what operations the bytes represent. The dedup table sits at the application layer because only the application knows that two byte-streams represent "the same intent" and that one of them must be ignored. Lower-layer mechanisms (TCP retransmits, gRPC retries, queue redelivery) cannot deduplicate at the operation level because they do not know what an operation is.

Garbage-collecting the dedup table

A dedup table that keeps every idempotency key forever is a memory and storage liability. The standard pattern is to keep keys for a window — typically 24 hours to 7 days — chosen as max_request_lifetime + max_clock_skew + safety_margin. Beyond that window, the assumption is that no client will retry an operation older than the window. The window is enforced by either (a) a created_at column with a partial index and a periodic delete-where-old job, or (b) a TTL on the row (DynamoDB TTL, Redis EXPIRE). The bug to avoid: a client retry that arrives after the dedup row has been GC'd — the server will treat it as a fresh operation. The mitigation is to make the window strictly larger than any client's retry deadline. PaySetu's dedup window is 7 days and the client-side max retry deadline is 24 hours, leaving 6-day safety.

The "first attempt wins" vs "last attempt wins" decision

When two retries of the same idempotency key arrive concurrently — request and retry race because the deadline was set short — which one wins? The standard answer is "first attempt wins": the server inserts into the dedup table with INSERT ... ON CONFLICT DO NOTHING, and the loser of the race replays the winner's stored response. This is correct for almost all operations. The exception is operations whose payload may legitimately change between retries (rare — most operations have stable payloads across retries by construction); for those, you need application-level reasoning about which version is canonical, and the dedup table is not the right primitive.

Idempotency keys vs sequence numbers

Kafka's idempotent producer uses sequence numbers per (producer-ID, partition) instead of arbitrary client-generated keys. The trade-off is: sequence numbers compress beautifully (one int64 per partition) but require the producer to track its own "next seq" state durably across restarts; idempotency keys are simpler conceptually but cost storage proportional to operation rate. For high-throughput stream pipelines (millions of messages per second, like Kafka), sequence numbers are the right primitive. For RPC-shaped workloads (thousands of requests per second, like an OMS), idempotency keys are the right primitive. The decision is operation rate vs implementation complexity.

Reproduce this on your laptop

# Reproduce the dedup demo
python3 -m venv .venv && source .venv/bin/activate
python3 rpc_dedup_demo.py

# Watch a real Postgres dedup table in action
docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=demo postgres:16
psql -h localhost -U postgres <<EOF
CREATE TABLE oms_dedup (idem_key text PRIMARY KEY, response jsonb, created_at timestamptz default now());
INSERT INTO oms_dedup (idem_key, response) VALUES ('K1', '{"ok":true}') ON CONFLICT DO NOTHING RETURNING idem_key;
INSERT INTO oms_dedup (idem_key, response) VALUES ('K1', '{"ok":true}') ON CONFLICT DO NOTHING RETURNING idem_key;
EOF
# First INSERT returns 1 row; second returns 0 rows — the duplicate detected.

Where this leads next

This chapter answered what the three semantics promise. The next chapters in Part 4 turn the answers into running code.

gRPC internals — how gRPC's retryPolicy, idempotencyLevel, and stream resumption interact with these semantics in practice.
Idempotency keys, request hashing, and dedup tables — the production-grade dedup-table chapter: schema design, GC, TTL windows, race conditions.
Wire protocols (Protobuf, Thrift, Cap'n Proto, FlatBuffers) — how to carry an idempotency key cleanly across protocol boundaries without breaking on schema evolution.
Message ordering: FIFO, causal, total — the next layer up: once delivery is at-least-once, in what order do the messages arrive?

Beyond Part 4, exactly-once-effect is the load-bearing primitive behind much of Part 14 (distributed transactions — sagas use idempotent compensating actions), Part 15 (messaging — Kafka EOS, RabbitMQ idempotent consumers), and Part 16 (workflows — Temporal activities are required to be idempotent because the workflow engine retries them on worker crashes).

References

End-to-End Arguments in System Design — Saltzer, Reed, Clark, ACM TOCS 1984. The foundational paper for "this property must live at the application layer, not the transport"; the formal reasoning behind why the dedup table is at the application.
Implementing Remote Procedure Calls — Birrell & Nelson, ACM TOCS 1984. The original RPC paper from Xerox PARC; first articulation of at-most-once / at-least-once distinctions.
Idempotency in the API Design — Stripe's idempotency-key documentation. The most widely-cited industrial implementation of the pattern; concrete header conventions and storage windows.
Exactly-Once Semantics in Apache Kafka — Confluent. The producer-side sequence-number plus consumer-side transactional-commit mechanism; the load-bearing claim "at-least-once delivery + atomic dedup state = exactly-once effect" made concrete.
Designing Data-Intensive Applications — Kleppmann, O'Reilly 2017. Chapter 9 ("Consistency and Consensus") and Chapter 11 ("Stream Processing") for the formal treatment of delivery semantics across batch and stream systems.
The fallacies of distributed computing (revisited) — Part 4 opener; sets up why the network's unreliability forces the choice this chapter answers.
You Cannot Have Exactly-Once Delivery — Tyler Treat, 2015. The widely-shared blog post version of the impossibility argument; useful for the case made plainly.