In short

Dynamo-style leaderless databases solve two real problems — partition tolerance and horizontal write scale — that single-leader systems cannot. For those workloads where availability is existential (carts, sessions, telemetry, social inboxes), that trade is often worth it. But the cost is architectural, not just operational: eventual consistency leaks out of the database into the application. Your code must now handle five classes of problem that a Postgres primary hides entirely.

Leak 1 — conflict resolution in application code. Without a CRDT that fits your data, every read can return multiple concurrent versions and your application must merge them. Merge bugs are distributed-systems bugs.

Leak 2 — retry and idempotency. Writes can fail with QuorumFailed, or succeed with a lost ack. Retrying naively produces duplicates; every write must carry an operation id or a conditional-write predicate.

Leak 3 — session semantics. "I just posted; why don't I see my post?" Read-your-writes, monotonic-read, and causal-session guarantees require application-level session tokens, sticky routing, or middleware — none are free.

Leak 4 — no cross-key guarantees. Uniqueness ("one active subscription per user", "no two users with the same username") cannot be enforced by the database. Lightweight-transactions, lock services, or compensating workflows fill the gap at real cost.

Leak 5 — schema migrations are hard. Replicas may hold different versions of a record during a migration window. Additive-only, never-drop, rolling-deploy becomes the iron law.

Single-leader SQL hides all of this behind ACID. Leaderless exposes it. That is why most applications in 2026 still start with Postgres and migrate away only when a hard availability or scale wall forces the move. Build 11 pivots to the storage model designed around these trade-offs: wide-column stores (Cassandra, HBase, DynamoDB) — the physical shape that complements the distribution protocol you just built.

A user files a bug report. "I added milk to my cart at 8:47 PM. When I checked out at 9:02, milk was gone. Order placed without it. Please credit me."

You open the logs. The POST /cart/add at 20:47 returned 200 OK. The GET /cart at 21:02 returned a cart without milk. The POST /checkout at 21:03 consumed that cart. No exceptions, no 5xx, no restarts. Every component did what its contract said.

This is not a code bug. It is a consistency-model bug. Somewhere between 20:47 and 21:02, a concurrent write on another replica — a stale "clear cart" from an old tab, a background "mark abandoned" job, a remove that raced with the add — was reconciled against the add, and LWW threw the add away. The database did nothing wrong. Your application made a promise (the add "succeeded") that the database cannot keep on its own.

Every user-visible bug in a well-run leaderless deployment is a distributed-systems bug in disguise. Welcome to the wall.

The five leaks

Across the rest of this chapter, follow one structural claim: when you choose leaderless, the database stops being a black box. Concerns that a Postgres primary hides — serialisation of concurrent writes, atomic commit, uniqueness, read-your-writes, schema evolution — all become application responsibilities.

The five leaks of eventual consistencyA diagram showing two stacks. On the left, a single-leader stack labelled Postgres has a thick database layer containing five concerns: conflict resolution, retry semantics, read-your-writes, cross-key invariants, and schema migrations. The application layer on top is thin. On the right, a leaderless stack labelled Dynamo-style has a thin database layer containing only replication and quorum; the same five concerns have leaked up into a thick application layer. An arrow between the two diagrams is labelled shifts upward.Where the five concerns liveSingle-leader SQLApplication (thin)Database (thick)serialised writesACID commitread-your-writesFK, UNIQUE, CHECKonline DDLretries at driverall five hiddenleaksupwardDynamo-styleApplication (thick)merge concurrent vers.retry + idempotencysession tokenscheck-then-writerolling schemacompensationsfive leaks visibleDatabase (thin)replication + quorumLeaderless buys availability and horizontal write scale; your application pays for it.
Five concerns that a single-leader SQL database handles for you — conflict resolution, retry semantics, read-your-writes, cross-key invariants, schema migrations — are shifted upward into application code when you go leaderless. The database layer becomes thinner; the application layer becomes thicker. This is the structural cost of choosing AP over CP.

Name them precisely.

Each of these deserves its own section. Each is a class of production bug that simply does not exist on Postgres.

Leak 1 — Merge logic in application code

You stored a user profile {name, email, address, phone} at key user:42. Two clients issued concurrent updates: a mobile app updated phone; a desktop app updated email. Both writes were accepted on different coordinators before they could see each other. Now replica X holds {name, email1, address, phone1} and replica Y holds {name, email2, address, phone2}, where only one field differs on each side.

The database has two options. It can apply LWW at the whole-document level and keep one update while silently discarding the other — the user's phone edit vanishes for no reason a human could explain. Or it can hand the application both versions (vector-clock siblings) and let the application merge. Either way, the problem has leaked upward.

If the application handles siblings, the merge function looks like this:

def merge_user_profile(versions: list[dict]) -> dict:
    """Merge concurrent sibling versions of a user profile.

    versions is a list of full profile dicts, each with a
    per-field timestamp like {"name": ("Priya", 1713000000), ...}.
    For each field, take the value with the largest timestamp.
    """
    fields = set().union(*(v.keys() for v in versions))
    merged = {}
    for f in fields:
        candidates = [v[f] for v in versions if f in v]
        merged[f] = max(candidates, key=lambda x: x[1])
    return merged

Why per-field LWW is safer than whole-document LWW: at the document level, the loser's edits to other fields get thrown away along with the losing field. Per-field LWW preserves all non-conflicting edits. The cost is that every field carries its own timestamp — the metadata overhead of the CRDT world, without the formal convergence guarantees. You are hand-building something that looks like an MV-Register without the rigor.

The subtle bugs are endless. Nested objects become a sequence-CRDT problem you did not sign up for. References to other keys? A merge on user:42 might restore a stale team_id that was deliberately cleared. Derived fields — full_name computed from first_name + last_name? Merge components, then recompute.

You will get some of this wrong. Every team writing merge logic gets some of it wrong; bugs surface weeks later as "my preference flipped back" tickets. In Postgres these bugs are impossible because the primary serialises writes — the second UPDATE simply overwrites the first and no sibling ever exists.

Leak 2 — Retry and idempotency

A client issues PUT cart:priya [milk, bread] to a coordinator. The coordinator sends to N=3 replicas; W=2 required. Two acks come back from replicas X and Y; a third from Z was lost in a timeout. From the database's view, the write succeeded — two replicas have it, anti-entropy will propagate to Z.

The coordinator tries to ack the client. The ack packet drops. The client, after a 500 ms timeout, retries.

Now the client issues the same PUT again. The second PUT lands on a different coordinator, fans out to the same three replicas, all three accept — but some databases do not treat the two PUTs as the same operation. The retry may be recorded as a second concurrent write with its own version tag. On the next read the application sees two siblings — both identical on the wire, but semantically duplicates.

Worse: imagine the operation was not a PUT but a counter.increment. The retry double-counts. The customer sees +100 for a single purchase.

The fix is idempotency at the application level. Every write carries a client-generated operation id:

import uuid

def charge_wallet(db, user_id, amount_paise):
    op_id = str(uuid.uuid4())  # client generates once, retries reuse it
    payload = {"op_id": op_id, "amount": amount_paise}
    for attempt in range(5):
        try:
            return db.apply_charge(user_id, payload, timeout=1.0)
        except (QuorumFailed, NetworkTimeout):
            continue  # safe: op_id is the same across retries
    raise BackendUnavailable()

The server side deduplicates by op_id:

def apply_charge(user_id, payload):
    if db.seen_op(user_id, payload["op_id"]):
        return db.get_result(user_id, payload["op_id"])  # cached result
    result = perform_charge(user_id, payload["amount"])
    db.record_op(user_id, payload["op_id"], result)
    return result

Why the op-id table is itself a consistency problem: seen_op and record_op must be atomic together with perform_charge, otherwise a crash between check and record lets a retry double-charge. On a single-leader SQL database, a transaction wraps all three. On a leaderless store, you need a conditional write — "record this op_id only if it does not exist" — as a lightweight-transaction or compare-and-swap. The idempotency layer is itself a mini-transaction protocol your code must implement.

The alternative to op-ids is conditional writes on version — "set this value only if the current version is v7". Every write becomes a read-modify-conditional-write loop. This is Dynamo's ConditionExpression shape; same complexity in a different place.

Leak 3 — Session semantics in application code

A user posts a comment. The POST /comments hits coordinator X, writes land on replicas X, Y, Z, coordinator X returns 201 Created. The user's next request — loading the page to see their comment — is routed by a load balancer to a different coordinator, W, which reads from replicas W, Y, X. Replica W has not yet received the write. The read completes before W catches up. The user's own comment is missing from the page.

This is not a bug. It is eventual consistency working as designed. A leaderless cluster does not guarantee a client will see their own writes on the next read. To get that guarantee, you need a session-level layer on top, covered in the read-your-writes, monotonic reads, causal sessions chapter. It has three implementation shapes, in ascending order of cost:

class CausalSession:
    def __init__(self, db):
        self.db = db
        self.observed = {}  # key -> last-seen version

    def write(self, key, value):
        version = self.db.put(key, value, after=self.observed)
        self.observed[key] = version

    def read(self, key):
        value, version = self.db.get(key, after=self.observed)
        self.observed[key] = max(self.observed.get(key, 0), version)
        return value

Why session tokens are not a free upgrade: blocking a read until a replica catches up turns a zero-latency request into one that waits for replication. Under heavy write load or partition, that wait can be seconds. Users experience "the page took forever to load" — the same user who experienced "my post is missing" under no-session-guarantee. The consistency dial is a dial, not a switch; every position has a cost.

Every Postgres application gets read-your-writes for free because the primary is the source of truth. In a leaderless store, you engineer for it, and the engineering is visible in the code — session.write(...) not db.write(...), session.read(...) not db.read(...).

Leak 4 — No cross-key guarantees

"Each username must be unique" is a one-line constraint in Postgres:

CREATE TABLE users (
    username TEXT UNIQUE NOT NULL,
    ...
);

That constraint is enforced atomically with the insert because all writes pass through a single serialising leader. Two concurrent inserts for username = "priya" — one wins, one fails with duplicate key. The application sees a clean error and tells the user to pick another name.

In a leaderless database, this constraint cannot be enforced by the database. Two concurrent INSERTs with the same username hit different coordinators, both pass (no replica has seen the other), both are durably stored. Later, read-repair or anti-entropy discovers two rows with the same username and has no basis to prefer either. LWW may discard one; vector-clock siblings surface both. Your application now has duplicate usernames in production, and the "fix" — decide which one wins, delete the other, email the loser — is a business problem.

Three workarounds exist, each with costs:

The same pattern repeats for every cross-key invariant — "at most one active subscription", "no two orders share an invoice number". In Postgres each is a UNIQUE, CHECK, or FOREIGN KEY. In Dynamo each is a protocol.

Why this is the hardest leak to see until production: in a test suite with synthetic data and serial traffic, cross-key races never fire. In staging with a few dozen concurrent users, they fire once a week. In production on Black Friday with hundreds of thousands of concurrent sign-ups, they fire thousands of times per minute. The bug is invisible in development and catastrophic at scale. This is why Dynamo-style adoption almost always starts with "cart" or "session" — workloads where no cross-key invariant exists — and never with "ledger" or "user accounts".

Leak 5 — Schema migrations are hard

Adding a column to a Postgres table is ALTER TABLE users ADD COLUMN phone TEXT;. The DDL runs on the primary; replicas apply it via WAL; the schema is universal within seconds. Any application code that assumes phone exists will read NULL on old rows and something-meaningful on new ones.

In a leaderless store there is no "the primary" to run DDL on. When you change the shape of a value:

The iron laws: never drop, additive only, rolling deploys, backwards-compatible parsing. Every schema change becomes a protocol change, because the database does not coordinate schema across replicas.

The overall pattern — distributed-systems problems become CRUD problems

Step back. The leaks are not separate bugs; they are instances of one structural fact.

In a single-leader database, the distributed-systems concerns have been solved for you by the primary — it serialises writes, enforces invariants, runs schema, manages replica state. Your code sees a clean RPC.

In a leaderless store your application code is the coordinator. You merge versions; dedupe retries; manage sessions; enforce invariants; coordinate schema evolution. The database's RPC is much thinner — put, get, cas — and the hard parts are above the line.

This is what "eventual consistency leaks" really means. The database is not incorrect; the abstraction boundary moves. Concerns that used to live inside the database now live in your application.

A useful diagnostic: in a Postgres codebase, grep for SELECT FOR UPDATE, transaction blocks, ON CONFLICT, and foreign keys. Every one of those constructs, in a leaderless port, becomes an application-level protocol — CAS loops, op-id tables, lock-service calls, compensating workflows. The cost is spread across the entire codebase.

When this is worth paying

Despite the cost, leaderless is the right answer for a specific shape of workload:

When it is not

For most applications, Postgres is still the right answer in 2026:

Start with Postgres. Migrate away only when you hit a wall — Build 9's write ceiling, a cross-region active-active requirement, or a specific workload (cart, session, telemetry) that profits from availability.

Python demonstration — a "simple" read-modify-write

Show the leak on the smallest possible operation: incrementing a counter.

def increment_counter_naive(db, key):
    v = db.get(key)        # read current value
    db.put(key, v + 1)     # write back plus one

This is correct on Postgres with transactions; catastrophically wrong on Dynamo-style. Two concurrent callers both read v = 5, both write v + 1 = 6. One +1 is lost. The counter goes from 5 to 6 instead of from 5 to 7. If the workload is "count page views", your analytics drift downward by the concurrency rate. If the workload is "count stock-on-hand", you oversell.

Fixing it requires either a CRDT counter (encode the merge into the type — see CRDTs chapter) or a compare-and-swap loop:

def increment_counter_safe(db, key, max_retries=10):
    for attempt in range(max_retries):
        value, version = db.get(key, with_version=True)
        try:
            db.compare_and_set(
                key, value + 1, expected_version=version
            )
            return value + 1
        except VersionMismatch:
            # another writer beat us; reread and retry
            continue
    raise TooMuchContention(f"CAS failed {max_retries} times")

Why CAS loops are not a universal fix: under heavy contention, the retry rate explodes. If ten clients are all incrementing the same key, each round of CAS only lets one succeed; the other nine retry; of those nine, one succeeds; and so on. The expected number of retries grows linearly with contention. Production counters under load ("likes on a viral post") grind to a halt on CAS; they need a CRDT PN-Counter or a sharded-counter pattern (N sub-keys that aggregate on read). Every concurrency primitive has a contention ceiling, and CAS's ceiling is low.

The naive three lines of Postgres-style code become ten lines of retry logic, plus a pager alert for TooMuchContention, plus a separate implementation path for hot keys. This multiplies across every mutation in your codebase.

An e-commerce checkout under leaderless

Walk the flow — cart, inventory, payment, order — and identify what fits Dynamo-style and what does not.

Cart. cart:priya is a set of item-ids. Additions are CRDT adds; removes are OR-Set removes; "add from two devices" merges as union. Fits perfectly — the workload the paper was written for.

Inventory check. inventory:sku_12345 is a shared counter. Concurrent checkouts read the same count, both decrement, both write — you oversell. Does not fit. Move inventory to a strongly-consistent store, or accept oversell and compensate.

Payment. Retries must not double-charge. op_id deduplication is the application-level protocol; the gateway call itself must also be idempotent (an idempotency_key header). Fits only with careful idempotency engineering.

Order placement. An order spans order:12345, inventory:sku_12345 (decrement), payment:abc (capture), cart:priya (clear). In Postgres this is one transaction. In Dynamo it is a saga — multi-step workflow with compensating actions for partial failure. Does not fit natively; requires a saga orchestrator.

The mature pattern. Use Postgres for order-critical steps (inventory, payment, final order — one ACID transaction) and Cassandra or DynamoDB for cart, session, telemetry, recommendation cache. Every mature e-commerce company, including Amazon itself, runs polyglot persistence of roughly this shape.

Common confusions

Going deeper

For senior engineers evaluating architectural choices, these three readings sharpen the trade-offs the chapter sketches.

CAP, twelve years later

Brewer's 2012 follow-up clarifies the practical reading of his 2000 conjecture. Partitions are rare; during non-partition operation you can have all three of C, A, and P. CAP's real message is: when partitions occur, you must design for them explicitly — pick A or C per operation, not once for the whole system. Dynamo is overwhelmingly AP; Spanner is overwhelmingly CP; most production systems are a per-operation mix.

PACELC — consistency vs latency even without partition

Abadi's 2012 refinement adds a second trade-off for the non-partition case: Else, pick Latency or Consistency. Even on a healthy network, strong consistency pays quorum round-trips that eventual consistency does not. Dynamo is PA/EL; Spanner is PC/EC. PACELC is more useful than CAP for architectural choices because the non-partition case is the common case.

Strong eventual consistency — the narrowest escape

The cleanest response to the five leaks is SEC via CRDTs. It does not eliminate the leaks — retries, sessions, invariants, schema still live in the application — but it eliminates Leak 1 by construction. Where your data fits a CRDT, use one. The practical limit is that most business data does not fit, and engineering new CRDTs is research-grade work.

Where this leads next

Build 10 ends here. You have walked the full Dynamo stack — leaderless replication (ch.75), consistent hashing (ch.76), N/R/W quorums (ch.77-78), sloppy quorums and hinted handoff (ch.79), gossip (ch.80), anti-entropy (ch.81), LWW and vector clocks (ch.82), CRDTs (ch.83), and the honest summary of what it costs the application (ch.84). The distribution layer is complete.

Build 11 opens on wide-column storage — Cassandra's, HBase's, and DynamoDB's physical layout. Where Build 10 was about how replicas talk to each other, Build 11 is about how one replica stores data on disk — column families, SSTables as storage primitive, partition-key-plus-clustering-key design, tombstone management. The two halves of the architecture were designed to match.

References

  1. Eric Brewer, CAP Twelve Years Later — How the Rules Have Changed, IEEE Computer 45(2), 2012 — the original CAP author revisits the theorem a decade in, introducing the per-operation reading and grounding it in production experience.
  2. Daniel Abadi, Consistency Tradeoffs in Modern Distributed Database System Design — CAP is Only Part of the Story, IEEE Computer 45(2), 2012 — the PACELC framework, sharper than CAP because it captures the common (non-partition) case.
  3. Kyle Kingsbury, Jepsen — Distributed Systems Safety Research — the public test series that has found real consistency bugs in Cassandra, Riak, MongoDB, CockroachDB, and dozens of others. Practitioner ground truth on what these systems actually guarantee.
  4. Martin Kleppmann, Designing Data-Intensive Applications, Chapter 12, O'Reilly 2017 — DDIA's closing chapter on polyglot persistence, eventual-consistency bugs, and how the five leaks play out in real deployments.
  5. Michael Stonebraker, The End of an Architectural Era, VLDB 2007 — the relational-database counter-argument. Stonebraker argues legacy architectures are wrong for modern workloads, but for different reasons than Dynamo does.
  6. Dan Pritchett, BASE — An Acid Alternative, ACM Queue 6(3), 2008 — the industry coinage of BASE (Basically Available, Soft-state, Eventually consistent). A short manifesto for the philosophy Dynamo embodies.